Opsmas 2025 Day 2: varint updated & tigerwired
TOC:
merry opsmas 2025 day 2.
varint
In my never-ending goal to make all software as efficient as possible, I collected and extended some variable-length integer strategies a while ago in varint.
It’s updated now with more features, more correctness, more examples, more different encoding kinds, and more documentation.
The original variable length data types are still there:
| Type | Metadata Location | Encoding | Max Bytes | 1-Byte Max | Sortable | Speed | Best For |
|---|---|---|---|---|---|---|---|
| Tagged | First byte | Big-endian | 9 | 240 | Yes | Fast | Database keys, sorted data |
| External | External | Little-endian | 8 | 255 | No | Fastest | Compact storage, metadata elsewhere |
| Split | First byte | Hybrid | 9 | 63 | No | Fast | Known bit boundaries, packing |
| Chained | Continuation bits | Variable | 9 | 127 | No | Slowest | Legacy compatibility |
| Packed | N/A | Bit-level | N/A | Configurable | Yes | Fast | Fixed-width integer arrays |
But we have even more now:
And plenty of fully integrated usage examples:
examples/
├── standalone/
│ ├── example_tagged.c
│ ├── example_external.c
│ ├── example_split.c
│ ├── example_chained.c
│ ├── example_packed.c
│ ├── example_dimension.c
│ ├── example_bitstream.c
│ └── rle_codec.c
│
├── integration/
│ ├── database_system.c
│ ├── network_protocol.c
│ ├── column_store.c
│ ├── game_engine.c
│ ├── sensor_network.c
│ ├── ml_features.c
│ ├── vector_clock.c
│ ├── delta_compression.c
│ └── sparse_matrix_csr.c
│
├── reference/
│ ├── kv_store.c
│ ├── timeseries_db.c
│ └── graph_database.c
│
└── advanced/
├── blockchain_ledger.c
├── dns_server.c
├── game_replay_system.c
├── bytecode_vm.c
├── inverted_index.c
├── financial_orderbook.c
├── log_aggregation.c
├── geospatial_routing.c
├── bloom_filter.c
├── autocomplete_trie.c
├── pointcloud_octree.c
├── trie_pattern_matcher.c
└── trie_interactive.cNow we include:
- Packed Bit Arrays
- Delta Encoding (varintDelta)
- Frame-of-Reference (varintFOR)
- Group Encoding (varintGroup)
- Patched Frame-of-Reference (varintPFOR)
- Dictionary Encoding (varintDict)
- Bitmap Encoding (varintBitmap)
- Adaptive Encoding (varintAdaptive)
- Automatic encoding selector that automatically analyzes data characteristics and chooses the optimal encoding strategy (DELTA, FOR, PFOR, DICT, BITMAP, or TAGGED). Achieves 1.35x-6.45x compression automatically without manual encoding selection. Self-describing format with 1-byte header. Ideal for mixed workloads, log compression, and API responses. Full details in varintAdaptive.h.
- Floating Point Compression (varintFloat)
- Run-Length Encoding (varintRLE)
- Elias Universal Codes (varintElias)
- SIMD Block-Packed Encoding (varintBP128)
stats
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
C 31 13324 9692 1461 2171
C Header 27 5011 2900 1539 572
CMake 1 193 129 31 33
===============================================================================
Total 59 18528 12721 3031 2776
===============================================================================tigerwired
I was looking into using WiredTiger for a project, but I didn’t want to get its viral license cooties all over my other work, so I wrote a standard “get this away from me, look i’m not touching you” wrapper thing where you can use the full WiredTiger API space over local unix sockets.
This is: tigerwired.
Some interesting parts:
- We manually rewrite/patch the WiredTiger build system to use it as a clean built-in sub-dependency of the proxy itself (clone the wiretiger repo manually into deps/wiredtiger and then tigerwired can build it)
- The proxy supports batched/bulk writing which gets within usable percentages of the actual embedded system itself for high throughput oeprations
- uses a custom wire protocol and includes client wrappers
Benchmarks
=== Optimized Workloads ===
>>> Proxy: fused-write (row-string)
=== TigerWired Proxy Benchmark ===
Socket: /tmp/tw_proxy.sock
Benchmark Configuration:
Workload: Fused Write (1-RTT)
Format: Row (String)
Records: 100000
Key Size: 16 bytes
Value Size: 100 bytes
Cache Size: 256 MB
Running: Fused Write (1-RTT per insert)...
Results: Fused Write
Total Time: 737.27 ms
Throughput: 135636 ops/sec
Data Rate: 15.73 MB/sec
Operations: 100000 total, 100000 success, 0 failed
Latency (us): min=5.0 avg=7.1 max=101.0
Percentiles: p50=6.0 p95=12.0 p99=24.0
=== Benchmark Complete ===
>>> Proxy: fused-read (row-string)
=== TigerWired Proxy Benchmark ===
Socket: /tmp/tw_proxy.sock
Benchmark Configuration:
Workload: Fused Read (1-RTT)
Format: Row (String)
Records: 100000
Key Size: 16 bytes
Value Size: 100 bytes
Cache Size: 256 MB
Running: Fused Write (populate data)...
Results: Fused Write
Total Time: 707.87 ms
Throughput: 141269 ops/sec
Data Rate: 16.39 MB/sec
Operations: 100000 total, 100000 success, 0 failed
Latency (us): min=4.0 avg=6.8 max=1917.0
Percentiles: p50=6.0 p95=11.0 p99=22.0
Running: Fused Read (1-RTT per search)...
Results: Fused Read
Total Time: 1335.68 ms
Throughput: 74868 ops/sec
Data Rate: 1.20 MB/sec
Operations: 100000 total, 100000 success, 0 failed
Latency (us): min=5.0 avg=7.3 max=3445.0
Percentiles: p50=6.0 p95=11.0 p99=24.0
=== Benchmark Complete ===
>>> Proxy: batch-write (row-string)
=== TigerWired Proxy Benchmark ===
Socket: /tmp/tw_proxy.sock
Benchmark Configuration:
Workload: Batch Write
Format: Row (String)
Records: 100000
Key Size: 16 bytes
Value Size: 100 bytes
Cache Size: 256 MB
Running: Batch Write (batch_size=1000)...
Results: Batch Write
Total Time: 68.78 ms
Throughput: 1453805 ops/sec
Data Rate: 168.64 MB/sec
Operations: 100000 total, 100000 success, 0 failed
Latency (us): min=0.4 avg=0.5 max=0.6
Percentiles: p50=0.5 p95=0.6 p99=0.6
=== Benchmark Complete ===
>>> Proxy: batch-read (row-string)
=== TigerWired Proxy Benchmark ===
Socket: /tmp/tw_proxy.sock
Benchmark Configuration:
Workload: Batch Read
Format: Row (String)
Records: 100000
Key Size: 16 bytes
Value Size: 100 bytes
Cache Size: 256 MB
Running: Batch Write (populate data, batch_size=1000)...
Results: Batch Write
Total Time: 80.65 ms
Throughput: 1239941 ops/sec
Data Rate: 143.83 MB/sec
Operations: 100000 total, 100000 success, 0 failed
Latency (us): min=0.5 avg=0.6 max=0.8
Percentiles: p50=0.6 p95=0.7 p99=0.8
Running: Batch Read (batch_size=1000)...
Results: Batch Read
Total Time: 26.72 ms
Throughput: 3743075 ops/sec
Data Rate: 434.20 MB/sec
Operations: 100000 total, 100000 success, 0 failed
Latency (us): min=0.2 avg=0.3 max=0.6
Percentiles: p50=0.2 p95=0.4 p99=0.6
=== Benchmark Complete ===Usage Details
TigerWired supports all the WiredTiger storage models:
| Model | Key Type | Value Type | Best For |
|---|---|---|---|
| Row Store | Variable | Variable | General purpose, variable-size data |
| Column Store (Fixed) | Record number | Fixed-size | Time series, append-only logs |
| Column Store (Variable) | Record number | Variable | Columnar analytics, sparse data |
| LSM Tree | Variable | Variable | Write-heavy workloads |
Configuration
// String keys and values
twc_create(session, "table:users", "key_format=S,value_format=S," "columns=(username,profile)");
// Integer keys with binary values
twc_create(session, "table:cache", "key_format=Q,value_format=u," "columns=(id,data)");
// Composite keys
twc_create(session, "table:events", "key_format=QS,value_format=Su," "columns=(timestamp,type,message,payload)");| Use Case | Recommended Model | Key Format | Value Format |
|---|---|---|---|
| General KV | Row Store | S or Q |
u |
| User Data | Row Store | S (username) |
SS... |
| Time Series | Column Fixed | r |
Q or packed |
| Logs | Column Variable | r |
S or u |
| Analytics | Column Groups | Q |
Multiple groups |
| High Write | LSM | Q or S |
u |
| Documents | Row Store | S (doc_id) |
u (JSON) |
Based on typical hardware (modern x86-64, NVMe storage):
Small Keys/Values (16B keys, 100B values)
| Operation | Embedded | Proxy | Batch/Fused | Overhead |
|---|---|---|---|---|
| Point read (cached) | 1-5 μs | 20-50 μs | 10-15 μs | 10-20x → 2-3x |
| Sequential write | 1-2 μs | 20-25 μs | 7-8 μs | 10-20x → 4-7x |
| Sequential scan | 0.2 μs/row | 7-21 μs/row | 0.2 μs/row | 35-100x → 1x |
| Random read | 1-2 μs | 14-16 μs | 10 μs | 8-14x → 6-8x |
| Batch write (1000 items) | 1-1.5 ms | 2-2.5 s | 120-160 ms | 1600x → 100x |
| Batch read (1000 items) | 18-20 ms | 728-2000 ms | 24 ms | 40-100x → 1.2x |
Large Keys/Values (128B keys, 256B values, 1M records)
| Operation | Embedded | Proxy (10K) | Fused (100K) | Batch (1M) | vs Embedded |
|---|---|---|---|---|---|
| Sequential write | 492K ops/sec | 44K ops/sec | 106K ops/sec | 582K ops/sec | 118% ✓ |
| Sequential read | 3.35M ops/sec | - | - | 2.07M ops/sec | 62% |
| Random read | 382K ops/sec | - | 54K ops/sec | - | - |
| Throughput (MB/sec) | 189 MB/s | 17 MB/s | 41 MB/s | 223 MB/s | 118% ✓ |
| Avg Latency | 1.8 μs | 22.5 μs | 9.1 μs | 1.5 μs | 83% ✓ |
Batch Size Scaling (100K records, 128B/256B):
- Batch 100: 503K ops/sec (102% of embedded)
- Batch 500: 565K ops/sec (115% of embedded)
- Batch 1000: 581K ops/sec (118% of embedded)
- Batch 5000: 676K ops/sec (137% of embedded)
stats
===============================================================================
Language Files Lines Code Comments Blanks
===============================================================================
C 20 8459 6158 764 1537
C Header 5 2150 820 1047 283
Shell 1 54 37 6 11
===============================================================================
Total 26 10663 7015 1817 1831
===============================================================================