Introduction to Redis Data Types

Intro to Redis Data Types

Redis stores keys pointing to values. Keys can be any binary value up to a reasonable size (using short ASCII strings is recommended for readability and debugging purposes). Values are one of five native Redis data types.

Redis value types are:

  • strings — a sequence of binary safe bytes up to 512 MB
  • hashes — a collection of key value pairs
  • lists — an in-insertion-order collection of strings
  • sets — a collection of unique strings with no ordering
  • sorted sets — a collection of unique strings ordered by user defined scoring

Let's break down what each data type means and how it works.

(For a slightly more technical look at all built-in redis data types, see the Redis data types page.)

Strings

A Redis string is a sequence of bytes.

Strings in Redis are binary safe (meaning they have a known length not determined by any special terminating characters), so you can store anything up to 512 megabytes in one string.

Strings are the cannonical "key value store" concept. You have a key pointing to a value, where both key and value are text or binary strings.

For all possible operations on strings, see the string docs.

String Data Example

Key: "name"
Value: "bob"

Special String Use Cases

Numeric Strings

If a string is an ASCII representation of an integer ("307") or a floating point number ("500.321"), Redis gives you high speed atomic counters by letting you increment and decrement the string's numeric value with one command to the server.

String Ranges

You can slice and dice strings too. Quickly retrieve parts of your string at any offset counting from either the beginning or end of the string. You can also append to existing strings and overwrite parts of existing strings for quick updates (avoiding a read/update/write cycle).

String range operators allow you to easily create position indexable array-like data structures too. You end up with a network accessible shared array with constant time access to any arbitrary element for reading and updating (note: explore all native Redis data types first before over-optimizing for performance).

Bit Strings

Redis has no concept of what your string represents. It's just binary data. Since strings are opaque binary sequences, Redis lets you operate on strings bit-by-bit too. You don't need any bitshifting or masking magic in your code since Redis handles it all for you. Creating extremely space efficient bit based data structures becomes easy (almost too easy).

Everybody's favorite "use the bits" data structure is a bloom filter, and you can throw one together using Redis in about 30 minutes. You can also use bit operations to quickly count all the set bits ("the ones") in a string enabling you to use bit strings as activity/usage/population counters with position based membership indexes. Redis bit operations give you the power to use all your ones and zeroes compactly and directly.

For example, if you want to count daily active users, you can set a bit in position uid to 1 (e.g. User ID 500 logs in, so you set bit position 500 to 1). If you do this on a new string for each day, you can ask Redis to tell you how many (and which) users were active on any given day directly without needing any complex (or, Dean-forbid, map reduce) queries.

If you have ten million users, how much space will storing ten million daily visits use? You can store unique ten million datapoints every day in a little over 1 MB per day. The math works out as: 1.2 megabytes * (1024 kilobytes / megabyte) * (1024 bytes / kilobyte) * (8 bits / byte) = 10,066,329 bits in 1.2 megabytes.

Hashes

A Redis hash is a collection of key value pairs.

A Redis hash holds many key value pairs, where each key and value is a string. Redis hashes do not support complex values directly (meaning, you can't have a hash field have a value of a list or set or another hash), but you can use hash fields to point to other top level complex values. The only special operation you can perform on hash field values is atomic increment/decrement of numeric contents.

You can think of a Redis hashes in two ways: as a direct object representation and as a way to store many small values compactly.

Direct object representations are simple to understand. Objects have a name (the key of the hash) and a collection of internal keys with values. See the example below for, well, an example.

Storing many small values using a hash is a clever Redis massive data storage technique. When a hash has a small number of fields (~100), Redis optimizes the storage and access efficency of the entire hash. Redis's small hash storage optimization raises an interesting behavior: it's more efficient to have 100 hashes each with 100 internal keys and values rather than having 10,000 top level keys pointing to string values. Using Redis hashes to optimize your data storage this way does require additional programming overhead for tracking where data ends up, but if your data storage is primarly string based, you can save a lot of memory overhead using this one weird trick.

For all possible operations on hashes, see the hash docs.

Hash Data Example

Key: "accountInfo:5531"
Value: "name" => "bob smith"
       "username" => "ultimate bob"
       "country" => "UK"
       "lastLogin" => "1383147407"
       "loginCount" => "3691"
       "lastPaymentSync" => "1383256813"
       "signup" => "1381254812"
       "isAdmin" => "false"

JSON

You can also think of a Redis hash as a JSON object (with non-nestable objects). In later posts, we'll talk about how to deal with JSON directly in Redis.

Hash Multi-Value Container Example

Key: "usernameToUidMapping:a"
Value: "alpha" => "1"
       "adam" => "312"
       "acrobat" => "333"
       "aromatic" => "664"


Key: "usernameToUidMapping:b"
Value: "beta" => "2"
       "betamax" => "32"
       "billith" => "443"

(Note: For a large user base, such a simple mapping scheme would quickly overrun the ~100 hash efficiency limit.)

Lists

Redis lists act like linked lists.

You can insert to, delete from, and traverse lists from either the head or tail of a list.

Use lists when you need to maintain values in the order they were inserted. (Redis does give you the option to insert into any arbitrary list position if you need to, but your insertion performance will degrade if you insert far from your start position.)

Redis lists are often used as producer/consumer queues. Insert items into a list then pop items from the list. What happens if your consumers try to pop from a list with no elements? You can ask Redis to wait for an element to appear and return it to you immediately when it gets added. This turns Redis into a real time message queue/event/job/task/notification system.

You can atomically remove elements off either end of a list, enabling any list to be treated as a stack or a queue.

You can also maintain fixed-length lists (capped collections) by trimming your list to a specific size after every insertion.

For all possible operations on lists, see the lists docs.

List Data Example

Key: "processURLsNext"
Value: ["http://google.com/", "http://theonion.com", "http://redis.io"]

Key: "userTimeline:userId:312"
Value: ["swimming away from park", "Whoops, park on fire", "Toasting marshmallows", "At park (it's cold!)"]

Sets

Redis sets are, well, sets.

A Redis set contains unique unordered Redis strings where each string only exists once per set. If you add the same element ten times to a set, it will only show up once. Sets are great for lazily ensuring something exists at least once without worrying about duplicate elements accumulating and wasting space. You can add the same string as many times as you like without needing to check if it already exists.

Sets are fast for membership checking, insertion, and deletion of members in the set.

Sets have efficient set operations, as you would expect. You can take the union, intersection, and difference of multiple sets at once. Results can either be returned to the caller or results can be stored in a new set for later usage.

Sets have constant time access for membership checks (unlike lists), and Redis even has convenient random member removal and returning ("pop a random element from the set") or random member returning without replacement ("give me 30 random-ish unique users") or with replacement ("give me 7 cards, but after each selection, put the card back so it can potentially be sampled again").

For all possible operations on sets, see the sets docs.

Set Data Example

Key: "adminUsers"
Value: ("bob", "ralph", "jose", "loa")

Key: "bannedUids"
Value: ("332", "4096", "271", "737199492")

Key: "forbiddenUsernames"
Value: ("boogie", "bogus", "fake", "cybercommander", "improv", "gandalf", "oberoth")

Key: "uniqueVisitors"
Val: ("127.0.0.1", "63.23.221.7", "4.2.2.2", "8.8.8.8")

Sorted Sets

Redis sorted sets are sets with a user-defined ordering.

For simplicity, you can think of a sorted set as a binary tree with unique elements. (Redis sorted sets are actually skip lists.) The sort order of elements is defined by each element's score.

Sorted sets are still sets. Elements may only appear once in a set. An element, for uniqueness purposes, is defined by its string contents. Inserting element "apple" with sorting score 3, then inserting element "apple" with sorting score 500 results in one element "apple" with sorting score 500 in your sorted set. Sets are only unique based on Data, not based on (Score, Data) pairs.

Make sure your data model relies on the string contents and not the element's score for uniqueness. Scores are allowed to be repeated (or even zero), but, one last time, set elements can only exist once per sorted set. For example, if you try to store the history of every user login as a sorted set by making the score the epoch of the login and the value the user id, you will end up storing only the last login epoch for all your users. Your set would grow to size of your userbase and not your desired size of userbase * logins.

Elements are added to your set with scores. You can update the score of any element at any time, just add the element again with a new score. Scores are represented by floating point doubles, so you can specify granularity of high precision timestamps if needed. Multiple elements may have the same score.

You can retrieve elements in a few different ways. Since everything is sorted, you can ask for elements starting at the lowest scores. You can ask for elements starting at the highest scores ("in reverse"). You can ask for elements by their sort score either in natural or reverse order.

For all possible operations on sorted sets, see the sorted sets docs.

Sorted Set Data Example

Key: "leaderBoard"
Value: ("smashmaster" [300], "pilot" [600], "oigo" [720])

Key: "lastLoggedInUids"
Value: ("332" [1383256813], "4096" [1383256916], "271" [1383257212], "737199492" [1383259419])