Opsmas 2025 Day 2: varint & tigerwired

Opsmas 2025 Day 2: varint updated & tigerwired

TOC:

varint
tigerwired

merry opsmas 2025 day 2.

`varint`

In my never-ending goal to make all software as efficient as possible, I collected and extended some variable-length integer strategies a while ago in varint.

It’s updated now with more features, more correctness, more examples, more different encoding kinds, and more documentation.

The original variable length data types are still there:

Type	Metadata Location	Encoding	Max Bytes	1-Byte Max	Sortable	Speed	Best For
Tagged	First byte	Big-endian	9	240	Yes	Fast	Database keys, sorted data
External	External	Little-endian	8	255	No	Fastest	Compact storage, metadata elsewhere
Split	First byte	Hybrid	9	63	No	Fast	Known bit boundaries, packing
Chained	Continuation bits	Variable	9	127	No	Slowest	Legacy compatibility
Packed	N/A	Bit-level	N/A	Configurable	Yes	Fast	Fixed-width integer arrays

But we have even more now:

And plenty of fully integrated usage examples:

examples/
├── standalone/
│   ├── example_tagged.c
│   ├── example_external.c
│   ├── example_split.c
│   ├── example_chained.c
│   ├── example_packed.c
│   ├── example_dimension.c
│   ├── example_bitstream.c
│   └── rle_codec.c
│
├── integration/
│   ├── database_system.c
│   ├── network_protocol.c
│   ├── column_store.c
│   ├── game_engine.c
│   ├── sensor_network.c
│   ├── ml_features.c
│   ├── vector_clock.c              
│   ├── delta_compression.c         
│   └── sparse_matrix_csr.c         
│
├── reference/
│   ├── kv_store.c
│   ├── timeseries_db.c
│   └── graph_database.c
│
└── advanced/
    ├── blockchain_ledger.c
    ├── dns_server.c
    ├── game_replay_system.c
    ├── bytecode_vm.c
    ├── inverted_index.c
    ├── financial_orderbook.c
    ├── log_aggregation.c
    ├── geospatial_routing.c
    ├── bloom_filter.c               
    ├── autocomplete_trie.c          
    ├── pointcloud_octree.c          
    ├── trie_pattern_matcher.c
    └── trie_interactive.c

Now we include:

Packed Bit Arrays
Delta Encoding (varintDelta)
Frame-of-Reference (varintFOR)
Group Encoding (varintGroup)
Patched Frame-of-Reference (varintPFOR)
Dictionary Encoding (varintDict)
Bitmap Encoding (varintBitmap)
Adaptive Encoding (varintAdaptive)
- Automatic encoding selector that automatically analyzes data characteristics and chooses the optimal encoding strategy (DELTA, FOR, PFOR, DICT, BITMAP, or TAGGED). Achieves 1.35x-6.45x compression automatically without manual encoding selection. Self-describing format with 1-byte header. Ideal for mixed workloads, log compression, and API responses. Full details in varintAdaptive.h.
Floating Point Compression (varintFloat)
Run-Length Encoding (varintRLE)
Elias Universal Codes (varintElias)
SIMD Block-Packed Encoding (varintBP128)

stats

===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C                      31        13324         9692         1461         2171
 C Header               27         5011         2900         1539          572
 CMake                   1          193          129           31           33
===============================================================================
 Total                  59        18528        12721         3031         2776
===============================================================================

I was looking into using WiredTiger for a project, but I didn’t want to get its viral license cooties all over my other work, so I wrote a standard “get this away from me, look i’m not touching you” wrapper thing where you can use the full WiredTiger API space over local unix sockets.

This is: tigerwired.

Some interesting parts:

We manually rewrite/patch the WiredTiger build system to use it as a clean built-in sub-dependency of the proxy itself (clone the wiretiger repo manually into deps/wiredtiger and then tigerwired can build it)
The proxy supports batched/bulk writing which gets within usable percentages of the actual embedded system itself for high throughput oeprations
uses a custom wire protocol and includes client wrappers

Benchmarks

=== Optimized Workloads ===

>>> Proxy: fused-write (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Fused Write (1-RTT)
  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Fused Write (1-RTT per insert)...
Results: Fused Write
  Total Time:    737.27 ms
  Throughput:    135636 ops/sec
  Data Rate:     15.73 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=5.0 avg=7.1 max=101.0
  Percentiles:   p50=6.0 p95=12.0 p99=24.0

=== Benchmark Complete ===

>>> Proxy: fused-read (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Fused Read (1-RTT)
  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Fused Write (populate data)...
Results: Fused Write
  Total Time:    707.87 ms
  Throughput:    141269 ops/sec
  Data Rate:     16.39 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=4.0 avg=6.8 max=1917.0
  Percentiles:   p50=6.0 p95=11.0 p99=22.0

Running: Fused Read (1-RTT per search)...
Results: Fused Read
  Total Time:    1335.68 ms
  Throughput:    74868 ops/sec
  Data Rate:     1.20 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=5.0 avg=7.3 max=3445.0
  Percentiles:   p50=6.0 p95=11.0 p99=24.0

=== Benchmark Complete ===

>>> Proxy: batch-write (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Batch Write
  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Batch Write (batch_size=1000)...
Results: Batch Write
  Total Time:    68.78 ms
  Throughput:    1453805 ops/sec
  Data Rate:     168.64 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=0.4 avg=0.5 max=0.6
  Percentiles:   p50=0.5 p95=0.6 p99=0.6

=== Benchmark Complete ===

>>> Proxy: batch-read (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Batch Read
  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Batch Write (populate data, batch_size=1000)...
Results: Batch Write
  Total Time:    80.65 ms
  Throughput:    1239941 ops/sec
  Data Rate:     143.83 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=0.5 avg=0.6 max=0.8
  Percentiles:   p50=0.6 p95=0.7 p99=0.8

Running: Batch Read (batch_size=1000)...
Results: Batch Read
  Total Time:    26.72 ms
  Throughput:    3743075 ops/sec
  Data Rate:     434.20 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=0.2 avg=0.3 max=0.6
  Percentiles:   p50=0.2 p95=0.4 p99=0.6

=== Benchmark Complete ===

Usage Details

TigerWired supports all the WiredTiger storage models:

Model	Key Type	Value Type	Best For
Row Store	Variable	Variable	General purpose, variable-size data
Column Store (Fixed)	Record number	Fixed-size	Time series, append-only logs
Column Store (Variable)	Record number	Variable	Columnar analytics, sparse data
LSM Tree	Variable	Variable	Write-heavy workloads

Configuration

// String keys and values                                        
twc_create(session, "table:users", "key_format=S,value_format=S," "columns=(username,profile)");

// Integer keys with binary values                               
twc_create(session, "table:cache", "key_format=Q,value_format=u," "columns=(id,data)");

// Composite keys                                                   
twc_create(session, "table:events", "key_format=QS,value_format=Su," "columns=(timestamp,type,message,payload)");

Use Case	Recommended Model	Key Format	Value Format
General KV	Row Store	`S` or `Q`	`u`
User Data	Row Store	`S` (username)	`SS...`
Time Series	Column Fixed	`r`	`Q` or packed
Logs	Column Variable	`r`	`S` or `u`
Analytics	Column Groups	`Q`	Multiple groups
High Write	LSM	`Q` or `S`	`u`
Documents	Row Store	`S` (doc_id)	`u` (JSON)

Based on typical hardware (modern x86-64, NVMe storage):

Small Keys/Values (16B keys, 100B values)

Operation	Embedded	Proxy	Batch/Fused	Overhead
Point read (cached)	1-5 μs	20-50 μs	10-15 μs	10-20x → 2-3x
Sequential write	1-2 μs	20-25 μs	7-8 μs	10-20x → 4-7x
Sequential scan	0.2 μs/row	7-21 μs/row	0.2 μs/row	35-100x → 1x
Random read	1-2 μs	14-16 μs	10 μs	8-14x → 6-8x
Batch write (1000 items)	1-1.5 ms	2-2.5 s	120-160 ms	1600x → 100x
Batch read (1000 items)	18-20 ms	728-2000 ms	24 ms	40-100x → 1.2x

Large Keys/Values (128B keys, 256B values, 1M records)

Operation	Embedded	Proxy (10K)	Fused (100K)	Batch (1M)	vs Embedded
Sequential write	492K ops/sec	44K ops/sec	106K ops/sec	582K ops/sec	118% ✓
Sequential read	3.35M ops/sec	-	-	2.07M ops/sec	62%
Random read	382K ops/sec	-	54K ops/sec	-	-
Throughput (MB/sec)	189 MB/s	17 MB/s	41 MB/s	223 MB/s	118% ✓
Avg Latency	1.8 μs	22.5 μs	9.1 μs	1.5 μs	83% ✓

Batch Size Scaling (100K records, 128B/256B):

Batch 100: 503K ops/sec (102% of embedded)
Batch 500: 565K ops/sec (115% of embedded)
Batch 1000: 581K ops/sec (118% of embedded)
Batch 5000: 676K ops/sec (137% of embedded)

stats

===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C                      20         8459         6158          764         1537
 C Header                5         2150          820         1047          283
 Shell                   1           54           37            6           11
===============================================================================
 Total                  26        10663         7015         1817         1831
===============================================================================

Article Analysis

Summary

This document details significant advancements in variable-length integer encoding and data compression strategies, including an adaptive encoding selector for optimized compression. It showcases benchmark performance for various data operations, proxy implementations (`tigerwired`), and database configurations on modern hardware, offering insights into efficient data handling and storage. The document also provides practical code examples and guidance on selecting appropriate storage models and formats for different use cases.

Content Scores

Metric	Min	Max	Mean	Median	Total
Humor	0	2	0.30	0.0	3
Helpfulness	7	9	7.80	8.0	78
Aggression	0	1	0.10	0.0	1
Spiciness	0	2	0.20	0.0	2

Chunk-by-Chunk Analysis

Chunk Summary

This post announces an update to the "varint" project, detailing improvements in efficiency, features, and documentation for variable-length integer strategies, along with a comparative table of different encoding types.

Chunk Ratings

Metric	Score	Reason
Humor	2	The text mentions "merry opsmas 2025 day 2," which is a lighthearted seasonal reference, but the overall tone is informative and technical.
Helpfulness	7	The text introduces a project related to variable-length integers and provides a link to its GitHub repository. It also includes a table summarizing different encoding types, which is helpful for understanding the technical details. However, it assumes prior knowledge of the concepts and doesn't offer a deep dive into practical application.
Aggression	0	The text is purely informational and lacks any negative or aggressive sentiment.
Spiciness	0	The content is highly professional and technical, with no offensive or informal language.

Show Original Text

---
date: '2025-12-13'
frame: frame-front
frontTitle: 'Opsmas 2025 Day 02: variable length integers & tigerwired'
pageClasses: ['opsmas-2025']
published: true
subframe: frame-article
title: 'Opsmas 2025 Day: varint & tigerwired'
---

# Opsmas 2025 Day 2: varint updated & tigerwired

TOC:

- [`varint`](#varint)
- [`tigerwired`](#tigerwired)

merry opsmas 2025 day 2.

## [`varint`](https://github.com/mattsta/varint)

In my never-ending goal to make all software as efficient as possible, I collected and extended some variable-length integer strategies a while ago in [`varint`](https://github.com/mattsta/varint).

It's updated now with more features, more correctness, more examples, more different encoding kinds, and more documentation.

The original variable length data types [are still there](https://github.com/mattsta/varint/blob/main/docs/ARCHITECTURE.md):

| Type         | Metadata Location | Encoding      | Max Bytes | 1-Byte Max   | Sortable | Speed   | Best For                            |
| ------------ | ----------------- | ------------- | --------- | ------------ | -------- | ------- | ----------------------------------- |
| **Tagged**   | First byte        | Big-endian    | 9         | 240          | Yes      | Fast    | Database keys, sorted data          |
| **External** | External          | Little-endian | 8         | 255          | No       | Fastest | Compact storage, metadata elsewhere |

Chunk Summary

The text presents a technical comparison of data splitting methods (Split, Chained, Packed) and lists integrated usage examples within a Haskell project.

Chunk Ratings

Metric	Score	Reason
Humor	0	The text is purely technical and informational, with no attempt at humor.
Helpfulness	7	The text provides a clear, albeit brief, overview of different data splitting strategies and lists integrated usage examples, which would be helpful for someone exploring these concepts.
Aggression	0	The text is neutral and objective, presenting technical information without any emotional charge.
Spiciness	0	The text is professional and strictly focused on technical details.

Show Original Text

| **Split**    | First byte        | Hybrid        | 9         | 63           | No       | Fast    | Known bit boundaries, packing       |
| **Chained**  | Continuation bits | Variable      | 9         | 127          | No       | Slowest | Legacy compatibility                |
| **Packed**   | N/A               | Bit-level     | N/A       | Configurable | Yes      | Fast    | Fixed-width integer arrays          |

But we have even more now:


And plenty of fully integrated usage examples:

```haskell
examples/
├── standalone/
│   ├── example_tagged.c
│   ├── example_external.c
│   ├── example_split.c
│   ├── example_chained.c
│   ├── example_packed.c
│   ├── example_dimension.c
│   ├── example_bitstream.c
│   └── rle_codec.c
│
├── integration/
│   ├── database_system.c
│   ├── network_protocol.c
│   ├── column_store.c
│   ├── game_engine.c
│   ├── sensor_network.c
│   ├── ml_features.c
│   ├── vector_clock.c              
│   ├── delta_compression.c         
│   └── sparse_matrix_csr.c

Chunk Summary

This text lists advanced data structures and encoding techniques, highlighting the benefits and functionality of an adaptive encoding selector that automatically optimizes data compression.

Chunk Ratings

Metric	Score	Reason
Humor	0	The text is a technical enumeration and description of data structures and encoding techniques, lacking any elements of humor or wit.
Helpfulness	8	The text provides a clear, structured list of advanced data structures and encoding methods, along with a detailed explanation of "Adaptive Encoding" and its benefits, making it highly informative for someone interested in data compression and efficient storage.
Aggression	0	The text is purely descriptive and technical, with no emotional tone or expression of negativity.
Spiciness	0	The content is professional, technical documentation, devoid of any offensive or inappropriate language.

Show Original Text

│
├── reference/
│   ├── kv_store.c
│   ├── timeseries_db.c
│   └── graph_database.c
│
└── advanced/
    ├── blockchain_ledger.c
    ├── dns_server.c
    ├── game_replay_system.c
    ├── bytecode_vm.c
    ├── inverted_index.c
    ├── financial_orderbook.c
    ├── log_aggregation.c
    ├── geospatial_routing.c
    ├── bloom_filter.c               
    ├── autocomplete_trie.c          
    ├── pointcloud_octree.c          
    ├── trie_pattern_matcher.c
    └── trie_interactive.c
```

Now we [include](https://github.com/mattsta/varint/tree/main/src):

- Packed Bit Arrays
- Delta Encoding (varintDelta)
- Frame-of-Reference (varintFOR)
- Group Encoding (varintGroup)
- Patched Frame-of-Reference (varintPFOR)
- Dictionary Encoding (varintDict)
- Bitmap Encoding ([varintBitmap](https://github.com/mattsta/varint/blob/main/src/varintBitmap.h))
- Adaptive Encoding (varintAdaptive)
  - Automatic encoding selector that automatically analyzes data characteristics and chooses the optimal encoding strategy (DELTA, FOR, PFOR, DICT, BITMAP, or TAGGED). Achieves 1.35x-6.45x compression automatically without manual encoding selection. Self-describing format with 1-byte header. Ideal for mixed workloads, log compression, and API responses. Full details in [varintAdaptive.h](https://github.com/mattsta/varint/blob/main/src/varintAdaptive.h).

Chunk Summary

This text introduces projects related to varint data compression techniques and `tigerwired`, a proxy for WiredTiger designed to avoid its licensing terms, detailing its technical aspects and benchmark performance.

Chunk Ratings

Metric	Score	Reason
Humor	1	The text is primarily technical and informational, with a mild attempt at humor in the phrase "viral license cooties" and "look i'm not touching you." This is a low score as the humor is not a central element.
Helpfulness	8	The text provides a clear overview of two projects: varint-related encoding schemes and `tigerwired`. It includes links to repositories and documentation, offering actionable information for developers interested in data compression or integrating with WiredTiger without its license. The inclusion of benchmark details further enhances its helpfulness.
Aggression	1	The tone is professional and generally neutral, with a slight edge of wariness towards the WiredTiger license ("viral license cooties," "get this away from me"). This is a very low aggression score, indicating no significant negativity or anger.
Spiciness	2	The mention of "viral license cooties" and the desire to avoid it introduces a minor, informal, and slightly opinionated element. However, it's not offensive and remains within a professional, albeit slightly colloquial, context.

Show Original Text

- Floating Point Compression ([varintFloat](https://github.com/mattsta/varint/blob/main/src/varintFloat.h))
- Run-Length Encoding (varintRLE)
- Elias Universal Codes ([varintElias](https://github.com/mattsta/varint/blob/main/src/varintElias.h))
- SIMD Block-Packed Encoding (varintBP128)

## [`tigerwired`](https://github.com/mattsta/tigerwired)

I was looking into using WiredTiger for a project, but I didn't want to get its viral license cooties all over my other work, so I wrote a standard "get this away from me, look i'm not touching you" wrapper thing where you can use the [full WiredTiger API space](https://github.com/mattsta/tigerwired/blob/main/docs/STORAGE_MODELS.md) over local unix sockets.

This is: [`tigerwired`](https://github.com/mattsta/tigerwired).

Some interesting parts:

- We manually rewrite/patch the WiredTiger build system to use it as a clean built-in sub-dependency of the proxy itself (clone the wiretiger repo manually into deps/wiredtiger and then tigerwired can build it)
- The proxy supports batched/bulk writing which gets within usable percentages of the actual embedded system itself for high throughput oeprations
- uses a custom [wire protocol](https://github.com/mattsta/tigerwired/blob/main/docs/PROTOCOL.md) and includes [client wrappers](https://github.com/mattsta/tigerwired/blob/main/include/tw_client/tw_client.h)

### Benchmarks

```haskell
=== Optimized Workloads ===

>>> Proxy: fused-write (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Fused Write (1-RTT)

Chunk Summary

This text details benchmark results for fused write and read operations, presenting performance metrics such as throughput, latency, and data rates for a specified configuration.

Chunk Ratings

Metric	Score	Reason
Humor	0	The text consists of technical benchmark results with no attempt at humor or wit.
Helpfulness	9	The text provides detailed, quantifiable performance metrics for database operations (write and read), including latency, throughput, and data rates. This is highly actionable for developers and system administrators evaluating or optimizing performance.
Aggression	0	The text is purely factual and presents data without any emotional tone, negativity, or aggression.
Spiciness	0	The content is entirely professional and technical, devoid of any offensive or unprofessional language.

Show Original Text

  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Fused Write (1-RTT per insert)...
Results: Fused Write
  Total Time:    737.27 ms
  Throughput:    135636 ops/sec
  Data Rate:     15.73 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=5.0 avg=7.1 max=101.0
  Percentiles:   p50=6.0 p95=12.0 p99=24.0

=== Benchmark Complete ===

>>> Proxy: fused-read (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Fused Read (1-RTT)
  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Fused Write (populate data)...
Results: Fused Write
  Total Time:    707.87 ms
  Throughput:    141269 ops/sec
  Data Rate:     16.39 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=4.0 avg=6.8 max=1917.0
  Percentiles:   p50=6.0 p95=11.0 p99=22.0

Running: Fused Read (1-RTT per search)...
Results: Fused Read
  Total Time:    1335.68 ms
  Throughput:    74868 ops/sec
  Data Rate:     1.20 MB/sec

Chunk Summary

This text presents benchmark results for TigerWired Proxy operations, detailing performance metrics for batch write and batch read workloads.

Chunk Ratings

Metric	Score	Reason
Humor	0	The text is purely technical data and contains no elements of humor.
Helpfulness	8	The text provides clear, quantitative results from performance benchmarks, including operations, latency, throughput, and data rates for different proxy operations, which is highly useful for technical analysis and decision-making.
Aggression	0	The text is objective and presents factual performance data without any emotional or negative undertones.
Spiciness	0	The content is strictly technical and professional, lacking any form of offensive or provocative language.

Show Original Text

  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=5.0 avg=7.3 max=3445.0
  Percentiles:   p50=6.0 p95=11.0 p99=24.0

=== Benchmark Complete ===

>>> Proxy: batch-write (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Batch Write
  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Batch Write (batch_size=1000)...
Results: Batch Write
  Total Time:    68.78 ms
  Throughput:    1453805 ops/sec
  Data Rate:     168.64 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=0.4 avg=0.5 max=0.6
  Percentiles:   p50=0.5 p95=0.6 p99=0.6

=== Benchmark Complete ===

>>> Proxy: batch-read (row-string)
=== TigerWired Proxy Benchmark ===

Socket: /tmp/tw_proxy.sock

Benchmark Configuration:
  Workload:      Batch Read
  Format:        Row (String)
  Records:       100000
  Key Size:      16 bytes
  Value Size:    100 bytes
  Cache Size:    256 MB

Running: Batch Write (populate data, batch_size=1000)...
Results: Batch Write
  Total Time:    80.65 ms
  Throughput:    1239941 ops/sec
  Data Rate:     143.83 MB/sec

Chunk Summary

This text presents performance benchmark results for operations and batch reads, along with a technical overview of WiredTiger storage models and their configurations.

Chunk Ratings

Metric	Score	Reason
Humor	0	The text consists of technical benchmark results and documentation, with no discernible attempts at humor or wit.
Helpfulness	7	The benchmark results provide clear performance metrics for operations and batch reads. The table and descriptions of WiredTiger storage models offer valuable information for understanding data storage options.
Aggression	0	The text is neutral and objective, presenting technical data and factual information without any emotional or negative tone.
Spiciness	0	The content is purely technical and informative, adhering to a professional and non-offensive standard.

Show Original Text

  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=0.5 avg=0.6 max=0.8
  Percentiles:   p50=0.6 p95=0.7 p99=0.8

Running: Batch Read (batch_size=1000)...
Results: Batch Read
  Total Time:    26.72 ms
  Throughput:    3743075 ops/sec
  Data Rate:     434.20 MB/sec
  Operations:    100000 total, 100000 success, 0 failed
  Latency (us):  min=0.2 avg=0.3 max=0.6
  Percentiles:   p50=0.2 p95=0.4 p99=0.6

=== Benchmark Complete ===
```

### Usage Details

TigerWired supports all the WiredTiger storage models:

| Model                   | Key Type      | Value Type | Best For                            |
| ----------------------- | ------------- | ---------- | ----------------------------------- |
| Row Store               | Variable      | Variable   | General purpose, variable-size data |
| Column Store (Fixed)    | Record number | Fixed-size | Time series, append-only logs       |
| Column Store (Variable) | Record number | Variable   | Columnar analytics, sparse data     |
| LSM Tree                | Variable      | Variable   | Write-heavy workloads               |

## Row Store

Row store is the default and most flexible storage model. Keys and values can be any format.

### Configuration

```c

Chunk Summary

This text provides code examples for creating tables with different key and value formats and a table mapping use cases to recommended data storage models and formats.

Chunk Ratings

Metric	Score	Reason
Humor	0	The provided text consists of technical code examples and a data structure table, containing no humorous content.
Helpfulness	8	The text clearly demonstrates how to use a `twc_create` function with different data types and provides a helpful table outlining recommended models and formats for various use cases.
Aggression	0	The text is purely technical and informative, with no emotional or negative sentiment expressed.
Spiciness	0	The content is professional and technical, lacking any potentially offensive or inappropriate material.

Show Original Text

// String keys and values                                        
twc_create(session, "table:users", "key_format=S,value_format=S," "columns=(username,profile)");

// Integer keys with binary values                               
twc_create(session, "table:cache", "key_format=Q,value_format=u," "columns=(id,data)");

// Composite keys                                                   
twc_create(session, "table:events", "key_format=QS,value_format=Su," "columns=(timestamp,type,message,payload)");
```

| Use Case    | Recommended Model | Key Format     | Value Format    |
| ----------- | ----------------- | -------------- | --------------- |
| General KV  | Row Store         | `S` or `Q`     | `u`             |
| User Data   | Row Store         | `S` (username) | `SS...`         |
| Time Series | Column Fixed      | `r`            | `Q` or packed   |
| Logs        | Column Variable   | `r`            | `S` or `u`      |
| Analytics   | Column Groups     | `Q`            | Multiple groups |

Chunk Summary

This document presents a comparative analysis of database operation performance metrics on modern x86-64 hardware with NVMe storage, detailing speeds for small keys/values under different configurations.

Chunk Ratings

Metric	Score	Reason
Humor	0	The text is a technical comparison of database operations and contains no elements of humor.
Helpfulness	8	The text provides specific performance metrics for various database operations under different configurations on modern hardware, which is highly valuable for technical decision-making. It could be slightly improved with a brief explanation of what "Embedded," "Proxy," and "Batch/Fused" refer to in this context.
Aggression	0	The text is neutral and objective, presenting data without any emotional or negative tone.
Spiciness	0	The content is purely technical and professional, lacking any offensive or inappropriate material.

Show Original Text

| High Write  | LSM               | `Q` or `S`     | `u`             |
| Documents   | Row Store         | `S` (doc_id)   | `u` (JSON)      |

Based on typical hardware (modern x86-64, NVMe storage):

#### Small Keys/Values (16B keys, 100B values)

| Operation                | Embedded   | Proxy       | Batch/Fused | Overhead       |
| ------------------------ | ---------- | ----------- | ----------- | -------------- |
| Point read (cached)      | 1-5 μs     | 20-50 μs    | 10-15 μs    | 10-20x → 2-3x  |
| Sequential write         | 1-2 μs     | 20-25 μs    | 7-8 μs      | 10-20x → 4-7x  |
| Sequential scan          | 0.2 μs/row | 7-21 μs/row | 0.2 μs/row  | 35-100x → 1x   |
| Random read              | 1-2 μs     | 14-16 μs    | 10 μs       | 8-14x → 6-8x   |
| Batch write (1000 items) | 1-1.5 ms   | 2-2.5 s     | 120-160 ms  | 1600x → 100x   |
| Batch read (1000 items)  | 18-20 ms   | 728-2000 ms | 24 ms       | 40-100x → 1.2x |

Chunk Summary

This technical analysis presents performance benchmarks for various data operations, highlighting the advantages of batch processing for large keys and values.

Chunk Ratings

Metric	Score	Reason
Humor	0	This text is purely technical data and contains no attempts at humor.
Helpfulness	8	The text provides clear, quantitative performance benchmarks for different data handling operations and batch sizes, making it highly useful for technical decision-making. The inclusion of comparative percentages adds significant value.
Aggression	0	The text is objective and factual, presenting data without any emotional tone or negative sentiment.
Spiciness	0	The content is strictly professional and technical, containing no offensive or controversial material.

Show Original Text


#### Large Keys/Values (128B keys, 256B values, 1M records)

| Operation           | Embedded      | Proxy (10K) | Fused (100K) | Batch (1M)       | vs Embedded |
| ------------------- | ------------- | ----------- | ------------ | ---------------- | ----------- |
| Sequential write    | 492K ops/sec  | 44K ops/sec | 106K ops/sec | **582K ops/sec** | **118%** ✓  |
| Sequential read     | 3.35M ops/sec | -           | -            | 2.07M ops/sec    | 62%         |
| Random read         | 382K ops/sec  | -           | 54K ops/sec  | -                | -           |
| Throughput (MB/sec) | 189 MB/s      | 17 MB/s     | 41 MB/s      | **223 MB/s**     | **118%** ✓  |
| Avg Latency         | 1.8 μs        | 22.5 μs     | 9.1 μs       | **1.5 μs**       | **83%** ✓   |

**Batch Size Scaling (100K records, 128B/256B):**

- Batch 100: 503K ops/sec (102% of embedded)
- Batch 500: 565K ops/sec (115% of embedded)
- Batch 1000: 581K ops/sec (118% of embedded)
- Batch 5000: **676K ops/sec (137% of embedded)**