Redis Experiment #3: JSON Storage
This is my last experiment for a while. Enjoy.
- It all started out simple enough.
- How about wrapping arbitrary commands?
- How about hash from JSON?
- How about storing nested JSON?
- Setting multiple JSON documents?
- What about the inner life of JSON documents?
- Pop a topping
- Push better noms
- Stuff a ballot box
- Shut it all down
- What’s all this now?
- The Box
- Nesting Syndrome
- Processing All The Things
- Codex
- fin.
It started out simple enough.
hmgetalljson key
Have a hash…
127.0.0.1:6379> hgetall "comic:bunny:1271"
1) "img_src"
2) "strips/191108.jpg"
3) "title"
4) "insert no metal objects"
5) "alt_text"
6) "the buzzing in your ears is normal. for a given value of normal."
Get some JSON…
127.0.0.1:6379> hgetalljson "comic:bunny:1271"
1) {"img_src":"strips/191108.jpg","title":"insert no metal objects","alt_text":"the buzzing in your ears is normal. for a given value of normal."}
How about wrapping arbitrary commands?
jsonwrap command args…
Have a multi-result…
127.0.0.1:6379> georadiusbymember nyc "jfk" 20 mi withdistance asc
1) 1) "jfk"
2) "0.00"
2) 1) "q4"
2) "10.86"
3) 1) "4545"
2) "11.52"
4) 1) "union square"
2) "12.51"
5) 1) "wtc one"
2) "12.85"
6) 1) "central park n/q/r"
2) "12.94"
Get some JSON…
127.0.0.1:6379> jsonwrap georadiusbymember nyc "jfk" 20 mi withdistance asc
{"jfk":"0.00","q4":"10.86","4545":"11.52","union square":"12.51","wtc one":"12.85","central park n/q/r":"12.94"}
Have multiple score-value pairs…
Get some JSON…
127.0.0.1:6379> jsonwrap zrange abc 0 -1 withscores
{"one":"1","two":"2","three":"3","four":"4","five":"5","six":"6"}
How about hash from JSON?
hmsetbyjson key json
Have some JSON…
127.0.0.1:6379> hmsetbyjson newcomic '{"id":"3337", "comic":"smbc", "date":"", "img_src":"http://www.smbc-comics.com/comics/20140424.png", "alt_comic":"http://www.smbc-comics.com/comics/20140424after.gif"}'
(integer) 1
Get a hash…
127.0.0.1:6379> hgetall "newcomic"
1) "id"
2) "3337"
3) "comic"
4) "smbc"
5) "date"
6) ""
7) "img_src"
8) "http://www.smbc-comics.com/comics/20140424.png"
9) "alt_comic"
10) "http://www.smbc-comics.com/comics/20140424after.gif"
Nice, but not JSON-y enough.
How about storing nested JSON?
Donut document:
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
One line for easier inline command adding:
jsondocset key json
Set donuts document inside Redis…
127.0.0.1:6379> jsondocset donuts '{ "id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": { "batter": [ { "id": "1001", "type": "Regular" }, { "id": "1002", "type": "Chocolate" }, { "id": "1003", "type": "Blueberry" }, { "id": "1004", "type": "Devil\'s Food" } ] }, "topping": [ { "id": "5001", "type": "None" }, { "id": "5002", "type": "Glazed" }, { "id": "5005", "type": "Sugar" }, { "id": "5007", "type": "Powdered Sugar" }, { "id": "5006", "type": "Chocolate with Sprinkles" }, { "id": "5003", "type": "Chocolate" }, { "id": "5004", "type": "Maple" } ] }'
(integer) 1
jsondocget json-key
Get donuts document back…
127.0.0.1:6379> jsondocget donuts
{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}
Eva JSON:
{
"writers" : [
"Hideaki Anno"
],
"rating" : "8.6",
"urlIMDB" : "http://www.imdb.com/title/tt0112159",
"genres" : [
"Animation",
"Action",
"Drama",
"Sci-Fi",
"Thriller"
],
"languages" : [
"Japanese",
"English"
],
"countries" : [
"Japan"
],
"idIMDB" : "tt0112159",
"rated" : "TV-14",
"releaseDate" : "20040703",
"directors" : [],
"urlPoster" : "http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg",
"simplePlot" : "A teenage boy finds himself recruited as a member of an elite team of pilots by his father.",
"runtime" : [
"30 min"
],
"filmingLocations" : [],
"plot" : "When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland",
"title" : "Neon Genesis Evangelion",
"year" : "1995–1996",
"metascore" : ""
}
Set eva document inside Redis…
127.0.0.1:6379> jsondocset eva ' { "writers" : [ "Hideaki Anno" ], "rating" : "8.6", "urlIMDB" : "http://www.imdb.com/title/tt0112159", "genres" : [ "Animation", "Action", "Drama", "Sci-Fi", "Thriller" ], "languages" : [ "Japanese", "English" ], "countries" : [ "Japan" ], "idIMDB" : "tt0112159", "rated" : "TV-14", "releaseDate" : "20040703", "directors" : [], "urlPoster" : "http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg", "simplePlot" : "A teenage boy finds himself recruited as a member of an elite team of pilots by his father.", "runtime" : [ "30 min" ], "filmingLocations" : [], "plot" : "When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland", "title" : "Neon Genesis Evangelion", "year" : "1995–1996", "metascore" : "" }'
(integer) 1
Get eva document back…
127.0.0.1:6379> jsondocget eva
{"filmingLocations":[],"rating":"8.6","urlIMDB":"http://www.imdb.com/title/tt0112159","urlPoster":"http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg","languages":["Japanese","English"],"simplePlot":"A teenage boy finds himself recruited as a member of an elite team of pilots by his father.","plot":"When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland","idIMDB":"tt0112159","runtime":["30 min"],"releaseDate":"20040703","writers":["Hideaki Anno"],"rated":"TV-14","directors":[],"metascore":"","year":"1995–1996","countries":["Japan"],"genres":["Animation","Action","Drama","Sci-Fi","Thriller"],"title":"Neon Genesis Evangelion"}
jsondocmget json-key1 json-key2 … json-keyN
Get eva and donuts with one command…
127.0.0.1:6379> jsondocmget eva donuts
1) {"filmingLocations":[],"rating":"8.6","urlIMDB":"http://www.imdb.com/title/tt0112159","urlPoster":"http://ia.media-imdb.com/images/M/MV5BMjA1Mzc4NzMwOF5BMl5BanBnXkFtZTcwNjc1MzYyMQ@@._V1_SY317_CR6,0,214,317_AL_.jpg","languages":["Japanese","English"],"simplePlot":"A teenage boy finds himself recruited as a member of an elite team of pilots by his father.","plot":"When the Angels start attacking the planet Earth in the year 2015, only a handful of 14-year-old EVA pilots are able to stop them. Young Shinji Ikari suddenly finds himself forced to pilot EVA-01, a giant organic mecha, designed and constructed by NERV, that is the only thing that can stop the Angels. Written by Chris Cleveland","idIMDB":"tt0112159","runtime":["30 min"],"releaseDate":"20040703","writers":["Hideaki Anno"],"rated":"TV-14","directors":[],"metascore":"","year":"1995–1996","countries":["Japan"],"genres":["Animation","Action","Drama","Sci-Fi","Thriller"],"title":"Neon Genesis Evangelion"}
2) {"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]},"topping":[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]}
Setting multiple JSON documents?
redis-cli
supports option -x
meaning “read the last argument from stdin
.”
jsondocsetbyjson field-for-keys json-array-of-maps
% curl https://api.github.com/repos/antirez/redis/issues | \
redis-cli -x jsondocsetbyjson id
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 88896 100 88896 0 0 32657 0 0:00:02 0:00:02 --:--:-- 32670
1) "32322401"
2) "32178366"
3) "32158343"
4) "32120786"
5) "32079847"
6) "32069502"
7) "32000849"
8) "31978880"
9) "31946624"
10) "31916021"
11) "31654833"
12) "31580242"
13) "31572881"
14) "31520958"
15) "31424829"
16) "31388377"
17) "31322069"
18) "31305226"
19) "31273036"
20) "31161693"
21) "31144714"
22) "31043782"
23) "31033181"
24) "31021768"
25) "30988519"
26) "30938375"
27) "30928907"
28) "30897010"
29) "30872687"
30) "30844280"
Investigate a newly added document…
127.0.0.1:6379> jsondocget 31144714
{"milestone":null,"html_url":"https://github.com/antirez/redis/pull/1673","user":{"login":"dchest","id":52677,"avatar_url":"https://avatars.githubusercontent.com/u/52677?","gravatar_id":"641aceb7e3d2eebea49f397c38048d0b","url":"https://api.github.com/users/dchest","html_url":"https://github.com/dchest","followers_url":"https://api.github.com/users/dchest/followers","following_url":"https://api.github.com/users/dchest/following{/other_user}","gists_url":"https://api.github.com/users/dchest/gists{/gist_id}","starred_url":"https://api.github.com/users/dchest/starred{/owner}{/repo}","subscriptions_url":"https://api.github.com/users/dchest/subscriptions","organizations_url":"https://api.github.com/users/dchest/orgs","repos_url":"https://api.github.com/users/dchest/repos","events_url":"https://api.github.com/users/dchest/events{/privacy}","received_events_url":"https://api.github.com/users/dchest/received_events","type":"User","site_admin":false},"comments":0,"created_at":"2014-04-09T10:10:49Z","number":1673,"state":"open","labels_url":"https://api.github.com/repos/antirez/redis/issues/1673/labels{/name}","labels":[],"url":"https://api.github.com/repos/antirez/redis/issues/1673","id":31144714,"assignee":null,"events_url":"https://api.github.com/repos/antirez/redis/issues/1673/events","pull_request":{"url":"https://api.github.com/repos/antirez/redis/pulls/1673","html_url":"https://github.com/antirez/redis/pull/1673","diff_url":"https://github.com/antirez/redis/pull/1673.diff","patch_url":"https://github.com/antirez/redis/pull/1673.patch"},"body":"","title":"Fix typo in 00-RELEASENOTES","comments_url":"https://api.github.com/repos/antirez/redis/issues/1673/comments","updated_at":"2014-04-09T10:10:49Z","closed_at":null}
What about the inner life of JSON documents?
jsonfieldget json-key sub-key1 sub-key2 … sub-keyN
Extract batter list…
127.0.0.1:6379> jsonfieldget donuts batters batter
[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]
Extract 4th batter in list (0-based indexing)…
Extract the type of the 4th batter…
Extract eva year(s)…
jsondockeys json-key sub-key1 sub-key2 … sub-keyN
Investigate keys for eva…
127.0.0.1:6379> jsondockeys eva
1) "urlIMDB"
2) "urlPoster"
3) "languages"
4) "simplePlot"
5) "plot"
6) "idIMDB"
7) "runtime"
8) "releaseDate"
9) "writers"
10) "rated"
11) "directors"
12) "rating"
13) "metascore"
14) "year"
15) "countries"
16) "genres"
17) "title"
18) "filmingLocations"
Investigate keys for 1st donut topping…
Pop a topping
Existing noms…
127.0.0.1:6379> jsonfieldget donuts topping
[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},{"id":"5004","type":"Maple"}]
jsonfieldrpop json-key sub-key1 sub-key2 … sub-keyN
Remove a bad nom…
Get updated noms…
127.0.0.1:6379> jsonfieldget donuts topping
[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"}]
Push better noms
jsonfieldrpushx json-key sub-key1 … sub-keyN new-json
Push much preferred topping (adding a string to array of maps)…
Observe all the noms…
127.0.0.1:6379> jsonfieldget donuts topping
[{"id":"5001","type":"None"},{"id":"5002","type":"Glazed"},{"id":"5005","type":"Sugar"},{"id":"5007","type":"Powdered Sugar"},{"id":"5006","type":"Chocolate with Sprinkles"},{"id":"5003","type":"Chocolate"},"Sweet Potato and Marshmallows"]
Stuff a ballot box
Chicken Dead JSON
{
"country" : "USA",
"imdburl" : "http://www.imdb.com/title/tt0462485/",
"metrics": {
"rating" : 6.3,
"series" : 0,
"stv" : 0,
"usascreens" : 2,
"runtime" : 103,
"ukscreens" : 0,
"year" : 2006,
"votes" : 4539
},
"imdbid" : "tt0462485",
"genres" : "Comedy,Horror,Musical,Romance",
"languages" : "English",
"title" : "Poultrygeist: Night of the Chicken Dead"
}
Set chicken…
127.0.0.1:6379> jsondocset chicken '{ "country" : "USA", "imdburl" : "http://www.imdb.com/title/tt0462485/", "metrics": { "rating" : 6.3, "series" : 0, "stv" : 0, "usascreens" : 2, "runtime" : 103, "ukscreens" : 0, "year" : 2006, "votes" : 4539 }, "imdbid" : "tt0462485", "genres" : "Comedy,Horror,Musical,Romance", "languages" : "English", "title" : "Poultrygeist: Night of the Chicken Dead" } '
(integer) 1
Get chicken…
127.0.0.1:6379> jsondocget chicken
{"country":"USA","imdburl":"http://www.imdb.com/title/tt0462485/","metrics":{"rating":6.3,"series":0,"stv":0,"usascreens":2,"runtime":103,"ukscreens":0,"year":2006,"votes":4539},"imdbid":"tt0462485","genres":"Comedy,Horror,Musical,Romance","languages":"English","title":"Poultrygeist: Night of the Chicken Dead"}
jsonfieldincrby json-key sub-key1 … sub-keyN integer
Vote for MOR CHICKENGEIST
Check out your affront to democracy…
127.0.0.1:6379> jsonfieldget chicken metrics
{"rating":6.3,"series":0,"stv":0,"usascreens":2,"runtime":103,"ukscreens":0,"year":2006,"votes":4989}
jsonfieldincrbyfloat json-key sub-key1 … sub-keyN double
Boost some ratings (IEEE-754 style)…
127.0.0.1:6379> jsonfieldincrbyfloat chicken metrics rating 399.5
"405.79999999999999999"
127.0.0.1:6379> jsonfieldget chicken metrics
{"rating":405.79999999999999999,"series":0,"stv":0,"usascreens":2,"runtime":103,"ukscreens":0,"year":2006,"votes":4989}
Shut it all down
jsonfielddel json-key sub-key1 … sub-keyN
Demand plain donuts…
Verify the blasphemy has been removed…
127.0.0.1:6379> jsondocget donuts
{"id":"0001","type":"donut","name":"Cake","ppu":0.55,"batters":{"batter":[{"id":"1001","type":"Regular"},{"id":"1002","type":"Chocolate"},{"id":"1003","type":"Blueberry"},{"id":"1004","type":"Devil's Food"}]}}
jsondocdel json-key
Actually, you’re going keto…
Verify you can’t sneak one at 3am…
What’s all this now?
This is an experiment in using Redis to:
- parse incoming JSON
- store full JSON as Redis data types
- with unlimited nesting support
- allow access and updating of nested fields
- generate result JSON
Data Types
As we saw in the examples above, this approach supports round-tripping JSON data types with no loss of type information.
The seven JSON data types are: map, list, number, string, true, false, null. The Redis JSON parser makes sure to store a bare true
differently than the string "true"
. The same goes for numbers — the number 42
is stored differently than the string "42"
.
JSON types are in heavy contrast to the default Redis storage model where, by default, everything is stored as a string. If you tell Redis SET mykey 5
then GET mykey
will return the string "5"
. It is fully up to the caller to interpret results appropriately.1
Implementation Background
We use boxing to store full JSON types in Redis. “Boxing” is when you store information about a type with the type, so you can figure out the true intent when you read it back.
A good example of boxing is any non-JIT interpreter. Let’s use Python as an example. If you type 1 + 1
in Python, the interpreter doesn’t run one plus one
because the interpreter doesn’t store integers directly. The interpreter stores [TYPE][THING]
. In this case: [INTEGER]1 [OPERATOR]+ [INTEGER]1
. Performing your operation, Python has to “unbox” the values to generate a usable expression. So, Python checks:2
- What is the type of first operand?
- Oh, it’s an integer. Let’s read a number.
- What’s the type of operator?
- Oh, it’s a valid operator. Let’s read it.
- What’s the type of final operand?
- Oh, it’s an integer too. Let’s read a number.
- Apply the operands to the operator.
- What is the type of result? It’s an integer.
- Let’s store
[INTEGER]2
.
Python eventually obtained enough information to run your original request of 1 + 1
, but in a very slow and indirect way. You can see why math in Python is typically 20x to 50x slower than using a module calling out to native code skipping the entire boxing->unboxing->boxing process (e.g. numpy/blas/cython).
Implementation
For JSON storage in Redis we box JSON types because we must round-trip native JSON types.
The Redis JSON type box is very tiny. It’s just one byte. One byte is eight bits. Since JSON has seven types, we can use one bit to represent each type and still have one bit free.
The Box
We’re going to use brackets to represent “bits in a box.”
Here’s an entire Redis JSON type box with nothing defined: [00000000]
(eight bits with no positions set).
Types are defined by position in the box, so each bit position represents a specific type. The (0-based) position meanings are:
- Bit 1: type is number
- Bit 2: type is string
- Bit 3: type is map
- Bit 4: type is list
- Bit 5:
true
- Bit 6:
false
- Bit 7:
null
So, to represent the number 42
, we store [00000010]42
. To represent the string "bits in a box"
we store [00000100]bits in a box
.
Optimization: One More Type
Because Redis stores everything internally as strings, we can make one big optimization: If your container type (a map or list) has homogeneous contents, we can skip boxing each individual value.
We only defined seven bits above. We have one remaining bit free:
- Bit 0: type is homogeneous container
Now, to store {"name":"rocko","struggle":"modern life"}
we add the strings to a Redis hash directly instead of boxing each string like [00000100]rocko
.
With the addition of our homogeneous container type, we can define our map as:
- is map
- is string
- is homogeneous
Our box for the entire map becomes [00001101]
. This means: if you store homogeneous types of numbers or strings in your map (or list), your only storage overhead is one byte to define the type of the map (or list). If you have a map with 1,000 values, your storage overhead is only one byte instead of 1,000 bytes if we boxed each string individually.
Notice we can do the same with numbers. {"age": 20, "height": 4000}
can store numbers directly without individual boxing because we define the map here as:
- is map
- is number
- is homogeneous
Giving us one box for the entire map of: [00001011]
.
The same goes for lists too. Storing a list of only numbers or only strings has an overhead of exactly one byte to define the type of the homogeneous storage container regardless of the size of the list.
Note: The homogeneous type of the container is based on values in the map. It’s okay if your map keys are strings while all your values are numbers. That still counts as a homogeneous number map. JSON defines keys only as strings anyway, so they can’t ever be non-homogeneous themselves.
Those Other Types
What about our friends true
, false
, and null
? Those types have zero variability and we can use their boxes as an immediate value.
For example, a JSON true
value results in storage of only [00100000]
with none of the actual letters of true
being stored anywhere. true
only exists when we reconstruct JSON for output.
Nesting Syndrome
How do we store nested maps and lists? After all, we can store {"a": {"b": {"c": [null, {"d":false}]}}}
and it gets reconstructed just fine.
Rules for nested storage:
- Every top-level JSON document is a JSON map
- Every top-level JSON document is named by its storage key
- Every JSON map is a Redis hash
- Every JSON array is a Redis list
- Keys of sub-types are stored as depth-appended keys
- Sub-containers of arrays use their list position for sub-keys
A depth-appended key is just the name of the current field appended to its parent’s name with :
.
Storing the document above with name zombo
will create keys: zombo
for the document itself only containing the a
map, zombo:a
to store the b
map, zombo:a:b
to store to store the c
map, zombo:a:b:c
to store the list, and zombo:a:b:c:1
to store the d
map.
zobmo
, zombo:a
, zombo:a:b
, and zombo:a:b:c:1
are Redis hashes. zombo:a:b:c
is a Redis list.
Now we have these keys in redis:
127.0.0.1:6379> keys *zombo*
1) "zombo:a:b:c:1"
2) "zombo"
3) "zombo:a:b"
4) "zombo:a"
5) "zombo:a:b:c"
Storing highly nested JSON with only one value per map is a pathological case just for testing. You wouldn’t want to store large amounts of zero-contents nested maps like that because it’s taking up overhead for creating new top-level keys, but each key doesn’t contain much useful information.
Reconstruction
How are pointers to sub-containers stored? We rely on a combination of boxes and depth keys.
In the zombo
example, the top-level document stored in the zombo
Redis hash is: a => [BOX]
where [BOX]
is the type of the sub-container.
For this specific case, a => [00001000]
meaning “When you try to read a
, look for a top-level hash key with name zombo:a
.”
Then, when we fetch zombo:a
, we get b => [00001000]
meaning “Go read a top-level hash key with name zombo:a:b
.”
And thus it goes all the way down the container tree. The only trick here is: sub-containers of lists are referenced by position in the list instead of by their name, since list elements have no names3.
Processing All The Things
Given all the box types and nested naming rules above, we have nine (9) total ways to retrieve nested data.
Four cases are speed optimizations because we detect homogeneous container types, so we can basically do a big LRANGE [key] 0 -1
or a big HGETALL [key]
instead of inspecting each individual value for unboxing.
The remaining five cases are for reconstructing JSON values based on individual box types.
Le Traversal Cases
This section is a more wordy version of the jsonObjFromBox() function.
1 – BOX_FETCH_MAP_NUMBER – [00001011]
Field points to homogeneous map of only numbers, so we can do HGETALL
and return only JSON numbers.
2 – BOX_FETCH_MAP_STRING – [00001101]
Field points to homogeneous map of only strings, so we can do HGETALL
and return only JSON strings.
3 – BOX_FETCH_LIST_NUMBER – [00010011]
Field points to homogeneous list of only numbers, so we can do LRANGE 0 -1
and return only JSON numbers.
4 – BOX_FETCH_LIST_STRING – [00010101]
Field points to homogeneous list of only strings, so we can do LRANGE 0 -1
and return only JSON strings.
5 – BOX_FETCH_MAP_DECODE – [00001000]
Field points to map with mixed types, so we must unbox each individual value (potentially pointing to more nested maps or lists) and reconstruct the appropriate JSON type for each individual value.
6 – BOX_FETCH_LIST_DECODE – [00010000]
Field points to list with mixed types, so we must unbox each individual value (potentially pointing to more nested maps or lists) and reconstruct the appropriate JSON type for each individual value.
7 – BOX_PARSE_NUMBER_AFTER_BOX – [00000010]
Field is a number after the box. Chop the box off and return the remaining bytes as a JSON number. Only used when storing numbers in mixed-type containers.
8 – BOX_PARSE_STRING_AFTER_BOX – [00000100]
Field is a string after the box. Chop the box off and return the remaining bytes as a JSON string. Only used when storing strings in mixed-type containers.
9 – BOX_DECODE_IMMEDIATE – [00100000]
or [01000000]
or [10000000]
Field is represented by the box itself. No more data needed. Read the box and generate an appropriate JSON true
, false
, or null
based on which bit is set in the box.
Codex
This experiment started, grew, and quickly became un-brain-keepable. After the first file reached 1,200 lines, I split it into seven total files for easier management and more clearly defined interfaces.
After the entire codebase reached 3,000 lines, I stopped adding features and wrote this. There are many unimplemented features and abilities, but no need to write production capabilities into a one-off experiment.
The code lives in the Kit of Redis Module Tools (krmt).
If you want to play around with JSON commands, you can build a json.so
module and load it into a Dynamic Redis. Dynamic Redis is my set of patches on top of Redis allowing runtime loading of modular commands.
If you run JSON commands, run them with the expectation of having your keyspace cluttered with one top-level key for every sub-container.
If you run JSON commands, only use them for keeping transient data or for playing around with JSON API dumps like in the issues
example. Further work on the JSON commands could introduce different data formats making any long-term storage unreadable.
It’s a toy, not a product.
If you want to add missing functionality, optimize exiting functionality for speed and/or space, fix bugs, add tests, or extend features, feel free to open issues or pull requests on krmt.
Benchmarks
I’m not including actual benchmarks here since perfomance varies wildly based on the shape and size of the JSON you are importing.
But, as a baseline, I can JSONDOCSET
the eva
JSON 16,000 to 17,000 times per second depending on the compiler and optimization settings used.
That’s obviously not amazing Redis 100,000+ operations-per-second speed, but it’s still usable (and you may not be write-heavy with importing JSON anyway).
Scrappy Little Tests
Estimated test coverage: 30% to 65%.
% ./runtest --single unit/type/jsonobj
Cleanup: may take some time... OK
Starting test server at port 11111
[ready]: 73676
Testing unit/type/jsonobj
[ready]: 73675
[ready]: 73677
[ready]: 73678
[ready]: 73680
[ready]: 73679
[ready]: 73681
[ready]: 73682
[ready]: 73683
[ready]: 73684
[ready]: 73685
[ready]: 73686
[ready]: 73687
[ready]: 73688
[ready]: 73689
[ready]: 73690
[ok]: JSONDOC - set simple
[ok]: JSONDOC - remove simple
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOC - set complex
[ok]: JSONDOC - remove complex
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOC - add sample types
[ok]: JSONDOC - remove all basic types
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELD - basic get string
[ok]: JSONFIELD - basic get number
[ok]: JSONFIELD - basic get list (numbers)
[ok]: JSONFIELD - basic get list (strings)
[ok]: JSONFIELD - basic get list (with map)
[ok]: JSONFIELD - basic get map
[ok]: JSONFIELD - basic get true
[ok]: JSONFIELD - basic get false
[ok]: JSONFIELD - basic get null
[ok]: JSONDOC - remove all test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELD - multi-layered get list
[ok]: JSONFIELD - multi-layered get first
[ok]: JSONFIELD - multi-layered get second
[ok]: JSONFIELD - multi-layered get second key
[ok]: JSONFIELD - multi-layered get third
[ok]: JSONFIELD - multi-layered get fourth
[ok]: JSONDOC - remove all test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELD - deep nested get - level 1
[ok]: JSONFIELD - deep nested get - level 2
[ok]: JSONFIELD - deep nested get - level 3
[ok]: JSONFIELD - deep nested get - level 4
[ok]: JSONFIELD - deep nested get - level 5
[ok]: JSONFIELD - deep nested get - level 6
[ok]: JSONFIELD - deep nested get - level [too low]
[ok]: JSONFIELD - deep nested get - level [too high]
[ok]: JSONDOC - remove all test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONFIELDDEL - basic delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - basic map delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - basic list delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - basic list delete II
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - nested list delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - nested list delete direct positional
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONFIELDDEL - nested map to empty map
[ok]: JSONFIELDDEL - verify document after nested map delete
[ok]: JSONFIELDDEL - nested list to empty list
[ok]: JSONFIELDDEL - verify document after nested list delete
[ok]: JSONFIELDDEL - nested map later field delete
[ok]: JSONFIELDDEL - verify document after delete
[ok]: JSONDOC - remove remaining test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOCSETBYJSON - set map
[ok]: JSONDOCSETBYJSON - set map inside list
[ok]: JSONDOCSETBYJSON - set multi-map inside list
[ok]: JSONDOC - remove remaining test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: JSONDOCKEYS - get keys of a map
[ok]: JSONFIELDINCRBY - increment by integer
[ok]: JSONFIELDINCRBYFLOAT - increment by float
[ok]: JSONFIELDRPUSHX - rpush string
[ok]: JSONFIELDRPUSHX - rpush string verify
[ok]: JSONFIELDRPUSHX - rpush number
[ok]: JSONFIELDRPUSHX - rpush string verify
[ok]: JSONFIELDRPOP - rpop number
[ok]: JSONFIELDRPUSHX - rpop string verify
[ok]: JSONFIELDRPOP - rpop number
[ok]: JSONFIELDRPUSHX - rpop string verify
[ok]: JSONFIELDRPOP - rpop mixed contents
[ok]: JSONFIELDRPOP - rpop mixed contents verify
[ok]: JSONFIELDRPOP - rpop mixed contents
[ok]: JSONFIELDRPOP - rpop mixed contents verify
[ok]: JSONFIELDRPOP - rpop mixed contents
[ok]: JSONFIELDRPOP - rpop mixed contents verify
[ok]: JSONFIELDRPOP - rpop mixed contents last element
[ok]: JSONFIELDRPOP - rpop mixed contents verify empty
[ok]: JSONDOC - remove remaining test documents
[ok]: JSONDOC - verify remove cleaned all keys
[ok]: Check for memory leaks (pid 73692)
[1/1 done]: unit/type/jsonobj (0 seconds)
The End
Execution time of different units:
0 seconds - unit/type/jsonobj
\o/ All tests passed without errors!
Cleanup: may take some time... OK
fin.
Summary:
- 3,171+ lines of C implementing 15 JSON commands.
- 4,500+ words in this post (including copy/pasted commands, JSON, and tests)
- Two lost weekends
One exception to the “everything is a string” rule is the result of
INCRBY
commands. Because Redis knows you are incrementing a number, it returns the result as an integer. Though, Redis doesn’t have a floating point type (which would complicate the protocol a lot), so when you run anINCRBYFLOAT
, your result is returned as a string.↩This description is potentially factually wrong, but logically correct. I didn’t go back to look at any Python internals, but it’s just how ’terperters and language VMs without JIT or compile-time definitions have to do math.↩
Yes, this means adding or removing elements inside of lists is tricky. In fact, right now we only support appending to lists and popping elements off the end of lists, so we don’t worry about needing to rename any sub-containers because their position in the list changed.↩