Redis as JSON document store

Redis Experiment #3: JSON Storage

This is my last experiment for a while. Enjoy.

It all started out simple enough.
How about wrapping arbitrary commands?
How about hash from JSON?
How about storing nested JSON?
Setting multiple JSON documents?
What about the inner life of JSON documents?
Pop a topping
Push better noms
Stuff a ballot box
Shut it all down
What’s all this now?
The Box
Nesting Syndrome
Processing All The Things
Codex
fin.

It started out simple enough.

hmgetalljson key

Have a hash…

127.0.0.1:6379> hgetall "comic:bunny:1271"
1) "img_src"
2) "strips/191108.jpg"
3) "title"
4) "insert no metal objects"
5) "alt_text"
6) "the buzzing in your ears is normal. for a given value of normal."

Get some JSON…

127.0.0.1:6379> hgetalljson "comic:bunny:1271"
1) {"img_src":"strips/191108.jpg","title":"insert no metal objects","alt_text":"the buzzing in your ears is normal. for a given value of normal."}

How about wrapping arbitrary commands?

jsonwrap command args…

Have a multi-result…

127.0.0.1:6379> georadiusbymember nyc "jfk" 20 mi withdistance asc
1) 1) "jfk"
   2) "0.00"
2) 1) "q4"
   2) "10.86"
3) 1) "4545"
   2) "11.52"
4) 1) "union square"
   2) "12.51"
5) 1) "wtc one"
   2) "12.85"
6) 1) "central park n/q/r"
   2) "12.94"

Get some JSON…

127.0.0.1:6379> jsonwrap georadiusbymember nyc "jfk" 20 mi withdistance asc
{"jfk":"0.00","q4":"10.86","4545":"11.52","union square":"12.51","wtc one":"12.85","central park n/q/r":"12.94"}

Have multiple score-value pairs…

127.0.0.1:6379> zadd abc 1 one 2 two 3 three 4 four 5 five 6 six
(integer) 6

Get some JSON…

127.0.0.1:6379> jsonwrap zrange abc 0 -1 withscores
{"one":"1","two":"2","three":"3","four":"4","five":"5","six":"6"}

How about hash from JSON?

hmsetbyjson key json

Have some JSON…

127.0.0.1:6379> hmsetbyjson newcomic '{"id":"3337", "comic":"smbc", "date":"", "img_src":"http://www.smbc-comics.com/comics/20140424.png", "alt_comic":"http://www.smbc-comics.com/comics/20140424after.gif"}'
(integer) 1

Get a hash…

127.0.0.1:6379> hgetall "newcomic"
 1) "id"
 2) "3337"
 3) "comic"
 4) "smbc"
 5) "date"
 6) ""
 7) "img_src"
 8) "http://www.smbc-comics.com/comics/20140424.png"
 9) "alt_comic"
10) "http://www.smbc-comics.com/comics/20140424after.gif"

Nice, but not JSON-y enough.

How about storing nested JSON?

Donut document:

{
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "ppu": 0.55,
    "batters":
        {
            "batter":
                [
                    { "id": "1001", "type": "Regular" },
                    { "id": "1002", "type": "Chocolate" },
                    { "id": "1003", "type": "Blueberry" },
                    { "id": "1004", "type": "Devil's Food" }
                ]
        },
    "topping":
        [
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate with Sprinkles" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
        ]
}

One line for easier inline command adding:

{ "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil\'s Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" }, { "id": "5007", "type": "Powdered Sugar" }, { "id": "5006", "type": "Chocolate with Sprinkles" }, { "id": "5003", "type": "Chocolate" }, { "id": "5004", "type": "Maple" } ] }

jsondocset key json

Set donuts document inside Redis…

127.0.0.1:6379> jsondocset donuts '{ "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil\'s Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" }, { "id": "5007", "type": "Powdered Sugar" }, { "id": "5006", "type": "Chocolate with Sprinkles" }, { "id": "5003", "type": "Chocolate" }, { "id": "5004", "type": "Maple" } ] }'
(integer) 1

jsondocget json-key

Get donuts document back…

127.0.0.1:6379> jsondocget donuts
{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}

Eva JSON:

{
   "writers" : [
      "Hideaki Anno"
   ],
   "rating" : "8.6",
   "urlIMDB" : "http://www.imdb.com/title/tt0112159",
   "genres" : [
      "Animation",
      "Action",
      "Drama",
      "Sci-Fi",
      "Thriller"
   ],
   "languages" : [
      "Japanese",
      "English"
   ],
   "countries" : [
      "Japan"
   ],
   "idIMDB" : "tt0112159",
   "rated" : "TV-14",
   "releaseDate" : "20040703",
   "directors" : [],
   "urlPoster" : "http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg",
   "simplePlot" : "A teenage boy finds himself recruited as a member of an elite team of pilots by his father.",
   "runtime" : [
      "30 min"
   ],
   "filmingLocations" : [],
   "plot" : "When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland",
   "title" : "Neon Genesis Evangelion",
   "year" : "1995–1996",
   "metascore" : ""
}

Set eva document inside Redis…

127.0.0.1:6379> jsondocset eva ' { "writers" : [ "Hideaki Anno" ], "rating" : "8.6", "urlIMDB" : "http://www.imdb.com/title/tt0112159", "genres" : [ "Animation", "Action", "Drama", "Sci-Fi", "Thriller" ], "languages" : [ "Japanese", "English" ], "countries" : [ "Japan" ], "idIMDB" : "tt0112159", "rated" : "TV-14", "releaseDate" : "20040703", "directors" : [], "urlPoster" : "http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg", "simplePlot" : "A teenage boy finds himself recruited as a member of an elite team of pilots by his father.", "runtime" : [ "30 min" ], "filmingLocations" : [], "plot" : "When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland", "title" : "Neon Genesis Evangelion", "year" : "1995–1996", "metascore" : "" }'
(integer) 1

Get eva document back…

127.0.0.1:6379> jsondocget eva
{"filmingLocations":[],"rating":"8.6","urlIMDB":"http://www.imdb.com/title/tt0112159","urlPoster":"http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg","languages":["Japanese","English"],"simplePlot":"A teenage boy finds himself recruited as a member of an elite team of pilots by his father.","plot":"When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland","idIMDB":"tt0112159","runtime":["30 min"],"releaseDate":"20040703","writers":["Hideaki Anno"],"rated":"TV-14","directors":[],"metascore":"","year":"1995–1996","countries":["Japan"],"genres":["Animation","Action","Drama","Sci-Fi","Thriller"],"title":"Neon Genesis Evangelion"}

jsondocmget json-key1 json-key2 … json-keyN

Get eva and donuts with one command…

127.0.0.1:6379> jsondocmget eva donuts
1) {"filmingLocations":[],"rating":"8.6","urlIMDB":"http://www.imdb.com/title/tt0112159","urlPoster":"http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg","languages":["Japanese","English"],"simplePlot":"A teenage boy finds himself recruited as a member of an elite team of pilots by his father.","plot":"When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland","idIMDB":"tt0112159","runtime":["30 min"],"releaseDate":"20040703","writers":["Hideaki Anno"],"rated":"TV-14","directors":[],"metascore":"","year":"1995–1996","countries":["Japan"],"genres":["Animation","Action","Drama","Sci-Fi","Thriller"],"title":"Neon Genesis Evangelion"}
2) {"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}

Setting multiple JSON documents?

redis-cli supports option -x meaning “read the last argument from stdin.”

jsondocsetbyjson field-for-keys json-array-of-maps

% curl https://api.github.com/repos/antirez/redis/issues | \
  redis-cli -x jsondocsetbyjson id 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 88896  100 88896    0     0  32657      0  0:00:02  0:00:02 --:--:-- 32670
 1) "32322401"
 2) "32178366"
 3) "32158343"
 4) "32120786"
 5) "32079847"
 6) "32069502"
 7) "32000849"
 8) "31978880"
 9) "31946624"
10) "31916021"
11) "31654833"
12) "31580242"
13) "31572881"
14) "31520958"
15) "31424829"
16) "31388377"
17) "31322069"
18) "31305226"
19) "31273036"
20) "31161693"
21) "31144714"
22) "31043782"
23) "31033181"
24) "31021768"
25) "30988519"
26) "30938375"
27) "30928907"
28) "30897010"
29) "30872687"
30) "30844280"

Investigate a newly added document…

127.0.0.1:6379> jsondocget 31144714
{"milestone":null,"html_url":"https://github.com/antirez/redis/pull/1673","user":{"login":"dchest","id":52677,"avatar_url":"https://avatars.githubusercontent.com/u/52677?","gravatar_id":"641aceb7e3d2eebea49f397c38048d0b","url":"https://api.github.com/users/dchest","html_url":"https://github.com/dchest","followers_url":"https://api.github.com/users/dchest/followers","following_url":"https://api.github.com/users/dchest/following{/other_user}","gists_url":"https://api.github.com/users/dchest/gists{/gist_id}","starred_url":"https://api.github.com/users/dchest/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/dchest/subscriptions","organizations_url":"https://api.github.com/users/dchest/orgs","repos_url":"https://api.github.com/users/dchest/repos","events_url":"https://api.github.com/users/dchest/events{/privacy}","received_events_url":"https://api.github.com/users/dchest/received_events","type":"User","site_admin":false},"comments":0,"created_at":"2014-04-09T10:10:49Z","number":1673,"state":"open","labels_url":"https://api.github.com/repos/antirez/redis/issues/1673/labels{/name}","labels":[],"url":"https://api.github.com/repos/antirez/redis/issues/1673","id":31144714,"assignee":null,"events_url":"https://api.github.com/repos/antirez/redis/issues/1673/events","pull_request":{"url":"https://api.github.com/repos/antirez/redis/pulls/1673","html_url":"https://github.com/antirez/redis/pull/1673","diff_url":"https://github.com/antirez/redis/pull/1673.diff","patch_url":"https://github.com/antirez/redis/pull/1673.patch"},"body":"","title":"Fix typo in 00-RELEASENOTES","comments_url":"https://api.github.com/repos/antirez/redis/issues/1673/comments","updated_at":"2014-04-09T10:10:49Z","closed_at":null}

What about the inner life of JSON documents?

jsonfieldget json-key sub-key1 sub-key2 … sub-keyN

Extract batter list…

127.0.0.1:6379> jsonfieldget donuts batters batter
[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]

Extract 4th batter in list (0-based indexing)…

127.0.0.1:6379> jsonfieldget donuts batters batter 3
{"id":"1004","type":"Devil's Food"}

Extract the type of the 4th batter…

127.0.0.1:6379> jsonfieldget donuts batters batter 3 type
"Devil's Food"

Extract eva year(s)…

127.0.0.1:6379> jsonfieldget eva year
"1995–1996"

jsondockeys json-key sub-key1 sub-key2 … sub-keyN

Investigate keys for eva…

127.0.0.1:6379> jsondockeys eva
 1) "urlIMDB"
 2) "urlPoster"
 3) "languages"
 4) "simplePlot"
 5) "plot"
 6) "idIMDB"
 7) "runtime"
 8) "releaseDate"
 9) "writers"
10) "rated"
11) "directors"
12) "rating"
13) "metascore"
14) "year"
15) "countries"
16) "genres"
17) "title"
18) "filmingLocations"

Investigate keys for 1st donut topping…

127.0.0.1:6379> jsondockeys donuts topping 0
1) "id"
2) "type"

Pop a topping

Existing noms…

127.0.0.1:6379> jsonfieldget donuts topping
[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]

jsonfieldrpop json-key sub-key1 sub-key2 … sub-keyN

Remove a bad nom…

127.0.0.1:6379> jsonfieldrpop donuts topping
{"id":"5004","type":"Maple"}

Get updated noms…

127.0.0.1:6379> jsonfieldget donuts topping
[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"}]

Push better noms

jsonfieldrpushx json-key sub-key1 … sub-keyN new-json

Push much preferred topping (adding a string to array of maps)…

127.0.0.1:6379> jsonfieldrpushx donuts topping '"Sweet Potato and Marshmallows"'
(integer) 7

Observe all the noms…

127.0.0.1:6379> jsonfieldget donuts topping
[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},"Sweet Potato and Marshmallows"]

Stuff a ballot box

Chicken Dead JSON

{
   "country" : "USA",
   "imdburl" : "http://www.imdb.com/title/tt0462485/",
   "metrics": {
       "rating" : 6.3,
       "series" : 0,
       "stv" : 0,
       "usascreens" : 2,
       "runtime" : 103,
       "ukscreens" : 0,
       "year" : 2006,
       "votes" : 4539
   },
   "imdbid" : "tt0462485",
   "genres" : "Comedy,Horror,Musical,Romance",
   "languages" : "English",
   "title" : "Poultrygeist: Night of the Chicken Dead"
}

Set chicken…

127.0.0.1:6379> jsondocset chicken '{ "country" : "USA", "imdburl" : "http://www.imdb.com/title/tt0462485/", "metrics": { "rating" : 6.3, "series" : 0, "stv" : 0, "usascreens" : 2, "runtime" : 103, "ukscreens" : 0, "year" : 2006, "votes" : 4539 }, "imdbid" : "tt0462485", "genres" : "Comedy,Horror,Musical,Romance", "languages" : "English", "title" : "Poultrygeist: Night of the Chicken Dead" } '
(integer) 1

Get chicken…

127.0.0.1:6379> jsondocget chicken
{"country":"USA","imdburl":"http://www.imdb.com/title/tt0462485/","metrics":{"rating":6.3,"series":0,"stv":0,"usascreens":2,"runtime":103,"ukscreens":0,"year":2006,"votes":4539},"imdbid":"tt0462485","genres":"Comedy,Horror,Musical,Romance","languages":"English","title":"Poultrygeist: Night of the Chicken Dead"}

jsonfieldincrby json-key sub-key1 … sub-keyN integer

Vote for MOR CHICKENGEIST

127.0.0.1:6379> jsonfieldincrby chicken metrics votes 450
(integer) 4989

Check out your affront to democracy…

127.0.0.1:6379> jsonfieldget chicken metrics
{"rating":6.3,"series":0,"stv":0,"usascreens":2,"runtime":103,"ukscreens":0,"year":2006,"votes":4989}

jsonfieldincrbyfloat json-key sub-key1 … sub-keyN double

Boost some ratings (IEEE-754 style)…

127.0.0.1:6379> jsonfieldincrbyfloat chicken metrics rating 399.5
"405.79999999999999999"
127.0.0.1:6379> jsonfieldget chicken metrics
{"rating":405.79999999999999999,"series":0,"stv":0,"usascreens":2,"runtime":103,"ukscreens":0,"year":2006,"votes":4989}

Shut it all down

jsonfielddel json-key sub-key1 … sub-keyN

Demand plain donuts…

127.0.0.1:6379> jsonfielddel donuts topping
(integer) 1

Verify the blasphemy has been removed…

127.0.0.1:6379> jsondocget donuts
{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]}}

jsondocdel json-key

Actually, you’re going keto…

127.0.0.1:6379> jsondocdel donuts
(integer) 1

Verify you can’t sneak one at 3am…

127.0.0.1:6379> jsondocget donuts
(nil)

What’s all this now?

This is an experiment in using Redis to:

parse incoming JSON
store full JSON as Redis data types
- with unlimited nesting support
allow access and updating of nested fields
generate result JSON

Data Types

As we saw in the examples above, this approach supports round-tripping JSON data types with no loss of type information.

The seven JSON data types are: map, list, number, string, true, false, null. The Redis JSON parser makes sure to store a bare true differently than the string "true". The same goes for numbers — the number 42 is stored differently than the string "42".

JSON types are in heavy contrast to the default Redis storage model where, by default, everything is stored as a string. If you tell Redis SET mykey 5 then GET mykey will return the string "5". It is fully up to the caller to interpret results appropriately.¹

Implementation Background

We use boxing to store full JSON types in Redis. “Boxing” is when you store information about a type with the type, so you can figure out the true intent when you read it back.

A good example of boxing is any non-JIT interpreter. Let’s use Python as an example. If you type 1 + 1 in Python, the interpreter doesn’t run one plus one because the interpreter doesn’t store integers directly. The interpreter stores [TYPE][THING]. In this case: [INTEGER]1 [OPERATOR]+ [INTEGER]1. Performing your operation, Python has to “unbox” the values to generate a usable expression. So, Python checks:²

What is the type of first operand?
Oh, it’s an integer. Let’s read a number.
What’s the type of operator?
Oh, it’s a valid operator. Let’s read it.
What’s the type of final operand?
Oh, it’s an integer too. Let’s read a number.
Apply the operands to the operator.
What is the type of result? It’s an integer.
Let’s store [INTEGER]2.

Python eventually obtained enough information to run your original request of 1 + 1, but in a very slow and indirect way. You can see why math in Python is typically 20x to 50x slower than using a module calling out to native code skipping the entire boxing->unboxing->boxing process (e.g. numpy/blas/cython).

Implementation

For JSON storage in Redis we box JSON types because we must round-trip native JSON types.

The Redis JSON type box is very tiny. It’s just one byte. One byte is eight bits. Since JSON has seven types, we can use one bit to represent each type and still have one bit free.

The Box

We’re going to use brackets to represent “bits in a box.”

Here’s an entire Redis JSON type box with nothing defined: [00000000] (eight bits with no positions set).

Types are defined by position in the box, so each bit position represents a specific type. The (0-based) position meanings are:

Bit 1: type is number
Bit 2: type is string
Bit 3: type is map
Bit 4: type is list
Bit 5: true
Bit 6: false
Bit 7: null

So, to represent the number 42, we store [00000010]42. To represent the string "bits in a box" we store [00000100]bits in a box.

Optimization: One More Type

Because Redis stores everything internally as strings, we can make one big optimization: If your container type (a map or list) has homogeneous contents, we can skip boxing each individual value.

We only defined seven bits above. We have one remaining bit free:

Bit 0: type is homogeneous container

Now, to store {"name":"rocko","struggle":"modern life"} we add the strings to a Redis hash directly instead of boxing each string like [00000100]rocko.

With the addition of our homogeneous container type, we can define our map as:

is map
is string
is homogeneous

Our box for the entire map becomes [00001101]. This means: if you store homogeneous types of numbers or strings in your map (or list), your only storage overhead is one byte to define the type of the map (or list). If you have a map with 1,000 values, your storage overhead is only one byte instead of 1,000 bytes if we boxed each string individually.

Notice we can do the same with numbers. {"age": 20, "height": 4000} can store numbers directly without individual boxing because we define the map here as:

is map
is number
is homogeneous

Giving us one box for the entire map of: [00001011].

The same goes for lists too. Storing a list of only numbers or only strings has an overhead of exactly one byte to define the type of the homogeneous storage container regardless of the size of the list.

Note: The homogeneous type of the container is based on values in the map. It’s okay if your map keys are strings while all your values are numbers. That still counts as a homogeneous number map. JSON defines keys only as strings anyway, so they can’t ever be non-homogeneous themselves.

Those Other Types

What about our friends true, false, and null? Those types have zero variability and we can use their boxes as an immediate value.

For example, a JSON true value results in storage of only [00100000] with none of the actual letters of true being stored anywhere. true only exists when we reconstruct JSON for output.

Nesting Syndrome

How do we store nested maps and lists? After all, we can store {"a": {"b": {"c": [null, {"d":false}]}}} and it gets reconstructed just fine.

Rules for nested storage:

Every top-level JSON document is a JSON map
Every top-level JSON document is named by its storage key
Every JSON map is a Redis hash
Every JSON array is a Redis list
Keys of sub-types are stored as depth-appended keys
- Sub-containers of arrays use their list position for sub-keys

A depth-appended key is just the name of the current field appended to its parent’s name with :.

Storing the document above with name zombo will create keys: zombo for the document itself only containing the a map, zombo:a to store the b map, zombo:a:b to store to store the c map, zombo:a:b:c to store the list, and zombo:a:b:c:1 to store the d map.

zobmo, zombo:a, zombo:a:b, and zombo:a:b:c:1 are Redis hashes. zombo:a:b:c is a Redis list.

127.0.0.1:6379> jsondocset zombo '{"a": {"b": {"c": [null, {"d":false}]}}}'
(integer) 1

Now we have these keys in redis:

127.0.0.1:6379> keys *zombo*
1) "zombo:a:b:c:1"
2) "zombo"
3) "zombo:a:b"
4) "zombo:a"
5) "zombo:a:b:c"

Storing highly nested JSON with only one value per map is a pathological case just for testing. You wouldn’t want to store large amounts of zero-contents nested maps like that because it’s taking up overhead for creating new top-level keys, but each key doesn’t contain much useful information.

Reconstruction

How are pointers to sub-containers stored? We rely on a combination of boxes and depth keys.

In the zombo example, the top-level document stored in the zombo Redis hash is: a => [BOX] where [BOX] is the type of the sub-container.

For this specific case, a => [00001000] meaning “When you try to read a, look for a top-level hash key with name zombo:a.”

Then, when we fetch zombo:a, we get b => [00001000] meaning “Go read a top-level hash key with name zombo:a:b.”

And thus it goes all the way down the container tree. The only trick here is: sub-containers of lists are referenced by position in the list instead of by their name, since list elements have no names³.

Processing All The Things

Given all the box types and nested naming rules above, we have nine (9) total ways to retrieve nested data.

Four cases are speed optimizations because we detect homogeneous container types, so we can basically do a big LRANGE [key] 0 -1 or a big HGETALL [key] instead of inspecting each individual value for unboxing.

The remaining five cases are for reconstructing JSON values based on individual box types.

Le Traversal Cases

This section is a more wordy version of the jsonObjFromBox() function.

1 – BOX_FETCH_MAP_NUMBER – `[00001011]`

Field points to homogeneous map of only numbers, so we can do HGETALL and return only JSON numbers.

2 – BOX_FETCH_MAP_STRING – `[00001101]`

Field points to homogeneous map of only strings, so we can do HGETALL and return only JSON strings.

3 – BOX_FETCH_LIST_NUMBER – `[00010011]`

Field points to homogeneous list of only numbers, so we can do LRANGE 0 -1 and return only JSON numbers.

4 – BOX_FETCH_LIST_STRING – `[00010101]`

Field points to homogeneous list of only strings, so we can do LRANGE 0 -1 and return only JSON strings.

5 – BOX_FETCH_MAP_DECODE – `[00001000]`

Field points to map with mixed types, so we must unbox each individual value (potentially pointing to more nested maps or lists) and reconstruct the appropriate JSON type for each individual value.

6 – BOX_FETCH_LIST_DECODE – `[00010000]`

Field points to list with mixed types, so we must unbox each individual value (potentially pointing to more nested maps or lists) and reconstruct the appropriate JSON type for each individual value.

7 – BOX_PARSE_NUMBER_AFTER_BOX – `[00000010]`

Field is a number after the box. Chop the box off and return the remaining bytes as a JSON number. Only used when storing numbers in mixed-type containers.

8 – BOX_PARSE_STRING_AFTER_BOX – `[00000100]`

Field is a string after the box. Chop the box off and return the remaining bytes as a JSON string. Only used when storing strings in mixed-type containers.

9 – BOX_DECODE_IMMEDIATE – `[00100000]` or `[01000000]` or `[10000000]`

Field is represented by the box itself. No more data needed. Read the box and generate an appropriate JSON true, false, or null based on which bit is set in the box.

Codex

This experiment started, grew, and quickly became un-brain-keepable. After the first file reached 1,200 lines, I split it into seven total files for easier management and more clearly defined interfaces.

After the entire codebase reached 3,000 lines, I stopped adding features and wrote this. There are many unimplemented features and abilities, but no need to write production capabilities into a one-off experiment.

The code lives in the Kit of Redis Module Tools (krmt).

If you want to play around with JSON commands, you can build a json.so module and load it into a Dynamic Redis. Dynamic Redis is my set of patches on top of Redis allowing runtime loading of modular commands.

If you run JSON commands, run them with the expectation of having your keyspace cluttered with one top-level key for every sub-container.

If you run JSON commands, only use them for keeping transient data or for playing around with JSON API dumps like in the issues example. Further work on the JSON commands could introduce different data formats making any long-term storage unreadable.

It’s a toy, not a product.

If you want to add missing functionality, optimize exiting functionality for speed and/or space, fix bugs, add tests, or extend features, feel free to open issues or pull requests on krmt.

Benchmarks

I’m not including actual benchmarks here since perfomance varies wildly based on the shape and size of the JSON you are importing.

But, as a baseline, I can JSONDOCSET the eva JSON 16,000 to 17,000 times per second depending on the compiler and optimization settings used.

That’s obviously not amazing Redis 100,000+ operations-per-second speed, but it’s still usable (and you may not be write-heavy with importing JSON anyway).

Scrappy Little Tests

Estimated test coverage: 30% to 65%.

% ./runtest --single unit/type/jsonobj
Cleanup: may take some time... OK
Starting test server at port 11111
[ready]: 73676
Testing unit/type/jsonobj
[ready]: 73675
[ready]: 73677
[ready]: 73678
[ready]: 73680
[ready]: 73679
[ready]: 73681
[ready]: 73682
[ready]: 73683
[ready]: 73684
[ready]: 73685
[ready]: 73686
[ready]: 73687
[ready]: 73688
[ready]: 73689
[ready]: 73690
[ok]: JSONDOC - set simple
[ok]: JSONDOC - remove simple
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOC - set complex
[ok]: JSONDOC - remove complex
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOC - add sample types
[ok]: JSONDOC - remove all basic types
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELD - basic get string
[ok]: JSONFIELD - basic get number
[ok]: JSONFIELD - basic get list (numbers)
[ok]: JSONFIELD - basic get list (strings)
[ok]: JSONFIELD - basic get list (with map)
[ok]: JSONFIELD - basic get map
[ok]: JSONFIELD - basic get true
[ok]: JSONFIELD - basic get false
[ok]: JSONFIELD - basic get null
[ok]: JSONDOC - remove all test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELD - multi-layered get list
[ok]: JSONFIELD - multi-layered get first
[ok]: JSONFIELD - multi-layered get second
[ok]: JSONFIELD - multi-layered get second key
[ok]: JSONFIELD - multi-layered get third
[ok]: JSONFIELD - multi-layered get fourth
[ok]: JSONDOC - remove all test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELD - deep nested get - level 1
[ok]: JSONFIELD - deep nested get - level 2
[ok]: JSONFIELD - deep nested get - level 3
[ok]: JSONFIELD - deep nested get - level 4
[ok]: JSONFIELD - deep nested get - level 5
[ok]: JSONFIELD - deep nested get - level 6
[ok]: JSONFIELD - deep nested get - level [too low]
[ok]: JSONFIELD - deep nested get - level [too high]
[ok]: JSONDOC - remove all test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELDDEL - basic delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - basic map delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - basic list delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - basic list delete II
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - nested list delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - nested list delete direct positional
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - nested map to empty map
[ok]: JSONFIELDDEL - verify document after nested map delete
[ok]: JSONFIELDDEL - nested list to empty list
[ok]: JSONFIELDDEL - verify document after nested list delete
[ok]: JSONFIELDDEL - nested map later field delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONDOC - remove remaining test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOCSETBYJSON - set map
[ok]: JSONDOCSETBYJSON - set map inside list
[ok]: JSONDOCSETBYJSON - set multi-map inside list
[ok]: JSONDOC - remove remaining test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOCKEYS - get keys of a map
[ok]: JSONFIELDINCRBY - increment by integer
[ok]: JSONFIELDINCRBYFLOAT - increment by float
[ok]: JSONFIELDRPUSHX - rpush string
[ok]: JSONFIELDRPUSHX - rpush string verify
[ok]: JSONFIELDRPUSHX - rpush number
[ok]: JSONFIELDRPUSHX - rpush string verify
[ok]: JSONFIELDRPOP - rpop number
[ok]: JSONFIELDRPUSHX - rpop string verify
[ok]: JSONFIELDRPOP - rpop number
[ok]: JSONFIELDRPUSHX - rpop string verify
[ok]: JSONFIELDRPOP - rpop mixed contents
[ok]: JSONFIELDRPOP - rpop mixed contents verify
[ok]: JSONFIELDRPOP - rpop mixed contents
[ok]: JSONFIELDRPOP - rpop mixed contents verify
[ok]: JSONFIELDRPOP - rpop mixed contents
[ok]: JSONFIELDRPOP - rpop mixed contents verify
[ok]: JSONFIELDRPOP - rpop mixed contents last element
[ok]: JSONFIELDRPOP - rpop mixed contents verify empty
[ok]: JSONDOC - remove remaining test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: Check for memory leaks (pid 73692)
[1/1 done]: unit/type/jsonobj (0 seconds)

                   The End

Execution time of different units:
  0 seconds - unit/type/jsonobj

\o/ All tests passed without errors!

Cleanup: may take some time... OK

fin.

Summary:

3,171+ lines of C implementing 15 JSON commands.
4,500+ words in this post (including copy/pasted commands, JSON, and tests)
Two lost weekends

One exception to the “everything is a string” rule is the result of INCRBY commands. Because Redis knows you are incrementing a number, it returns the result as an integer. Though, Redis doesn’t have a floating point type (which would complicate the protocol a lot), so when you run an INCRBYFLOAT, your result is returned as a string.↩
This description is potentially factually wrong, but logically correct. I didn’t go back to look at any Python internals, but it’s just how ’terperters and language VMs without JIT or compile-time definitions have to do math.↩
Yes, this means adding or removing elements inside of lists is tricky. In fact, right now we only support appending to lists and popping elements off the end of lists, so we don’t worry about needing to rename any sub-containers because their position in the list changed.↩

Metric	Min	Max	Mean	Median	Total
Humor	0	4	1.29	1.0	36
Helpfulness	2	9	7.36	8.0	206
Aggression	0	1	0.04	0.0	1
Spiciness	0	1	0.11	0.0	3

Redis Experiment #3: JSON Storage

It started out simple enough.

hmgetalljson key

Have a hash…

Get some JSON…

How about wrapping arbitrary commands?

jsonwrap command args…

Have a multi-result…

Get some JSON…

Have multiple score-value pairs…

Get some JSON…

How about hash from JSON?

hmsetbyjson key json

Have some JSON…

Get a hash…

How about storing nested JSON?

Donut document:

One line for easier inline command adding:

jsondocset key json

Set donuts document inside Redis…

jsondocget json-key

Get donuts document back…

Eva JSON:

Set eva document inside Redis…

Get eva document back…

jsondocmget json-key1 json-key2 … json-keyN

Get eva and donuts with one command…

Setting multiple JSON documents?

jsondocsetbyjson field-for-keys json-array-of-maps

Investigate a newly added document…

What about the inner life of JSON documents?

jsonfieldget json-key sub-key1 sub-key2 … sub-keyN

Extract batter list…

Extract 4th batter in list (0-based indexing)…

Extract the type of the 4th batter…

Extract eva year(s)…

jsondockeys json-key sub-key1 sub-key2 … sub-keyN

Investigate keys for eva…

Investigate keys for 1st donut topping…

Pop a topping

Existing noms…

jsonfieldrpop json-key sub-key1 sub-key2 … sub-keyN

Remove a bad nom…

Get updated noms…

Push better noms

jsonfieldrpushx json-key sub-key1 … sub-keyN new-json

Push much preferred topping (adding a string to array of maps)…

Observe all the noms…

Stuff a ballot box

Chicken Dead JSON

Set chicken…

Get chicken…

jsonfieldincrby json-key sub-key1 … sub-keyN integer

Vote for MOR CHICKENGEIST

Check out your affront to democracy…

jsonfieldincrbyfloat json-key sub-key1 … sub-keyN double

Boost some ratings (IEEE-754 style)…

Shut it all down

jsonfielddel json-key sub-key1 … sub-keyN

Demand plain donuts…

Verify the blasphemy has been removed…

jsondocdel json-key

Actually, you’re going keto…

Verify you can’t sneak one at 3am…

What’s all this now?

Data Types

Implementation Background

Implementation

The Box

Optimization: One More Type

Those Other Types

Nesting Syndrome

Reconstruction

Processing All The Things

Le Traversal Cases

1 – BOX_FETCH_MAP_NUMBER – [00001011]

2 – BOX_FETCH_MAP_STRING – [00001101]

3 – BOX_FETCH_LIST_NUMBER – [00010011]

4 – BOX_FETCH_LIST_STRING – [00010101]

5 – BOX_FETCH_MAP_DECODE – [00001000]

1 – BOX_FETCH_MAP_NUMBER – `[00001011]`

2 – BOX_FETCH_MAP_STRING – `[00001101]`

3 – BOX_FETCH_LIST_NUMBER – `[00010011]`

4 – BOX_FETCH_LIST_STRING – `[00010101]`

5 – BOX_FETCH_MAP_DECODE – `[00001000]`

6 – BOX_FETCH_LIST_DECODE – `[00010000]`

7 – BOX_PARSE_NUMBER_AFTER_BOX – `[00000010]`

8 – BOX_PARSE_STRING_AFTER_BOX – `[00000100]`

9 – BOX_DECODE_IMMEDIATE – `[00100000]` or `[01000000]` or `[10000000]`