I just came across a question about how to create highscore lists or leaderboards in ArangoDB, and how they would work when compared to Redis sorted sets.
This blog post tries to give an answer on the topic and also detailed instructions and queries for setting up highscore lists with ArangoDB.
A highscore list in Redis
Highscore lists are normally used to quickly determine who’s currently at the top, so we obviously need some sorted data structure.
Redis has a specialized datatype named sorted set which can be used for exactly this purpose. A sorted set in Redis is a value consisting of multiple key/value pairs, and that is automatically sorted by values. The sorted set is stored under a key so it can be accessed as a whole.
Here’s how one would create a sorted set named
highscores and populate
it with 5 key/value pairs in Redis (using redis-cli):
Adding a new entry to a sorted set is done using
Inserting into a Redis sorted set has logarithmic complexity.
Updating a score in the sorted set is done using
ZINCRBY. This command works
regardless of whether the to-be-updated key already exists in the sorted set.
If it exists, its score will be increased by the specified value, and if it does
not exist, it will be created with the specified value:
In this case the return value
1 indicates that a new key was added to the set
and that it didn’t update an existing one.
Querying the entries with the lowest scores from a Redis sorted set is trivial.
ZRANGE command will query the entries in the sorted set from lowest to
highest score. As the entries are already stored in sorted order, this is very
The following command queries the bottom 3 keys from the sorted set:
1 2 3 4
For querying in reverse order, there is
ZREVRANGE. Both commands can be
accompanied by the
WITHSCORES flag to also return the associated values (i.e.
the scores). Here are the top 3 key/value pairs in the sorted set:
1 2 3 4 5 6 7
For removing an entry from a sorted set there is
There are many more specialized Redis commands for working with sorted sets. The
Redis commands prefixed with a
Z are sorted set
A highscore list in ArangoDB
Now let’s try to mimic that with ArangoDB.
In ArangoDB, there is no such thing as a sorted set and no direct equivalent. Instead, data in ArangoDB are stored in collections. Collections are a general-purpose storage mechanism and they are not limited to storing just scores.
We also need a mechanism for keeping highscores sorted. By default, no
specific sort order is maintained for data in a collection. To have the
collection entries sorted by highscore values, we have to explicitly create
a (sorted) skiplist index on some attribute. We’ll use an attribute named
score for this.
The following shell commands create the collection and the index on
Once the collection is set up, we can switch to AQL for the following operations (though we could achieve the same with Shell commands).
To insert the same initial data as in the Redis case, we can run the following five AQL queries:
1 2 3 4 5
Note that I have been using the
_key attribute for saving the user id. Using the
_key attribute is normally beneficial because it is the collection’s primary key.
It is always present and automatically unique, so exactly what we need for maintaining
a highscore list. Note that there are some restrictions for what can be stored inside
_key attribute, but as long as values are only ASCII letters or digits, there
is nothing to worry about.
Inserting into the collection will also automatically populate the indexes. Inserting into a skiplist should have about logarithmic complexity on average (though this is not guaranteed – this is because the skiplist is a probabilistic data structure and internally it will be flipping coins. In theory there is a chance that it becomes badly balanced. But in practice it should be quite close to an average logarithmic complexity).
As we have some initial documents, we can now query the lowest and highest scores.
This will also be efficient as the queries will use the sorted index on
1 2 3 4
1 2 3 4
To store a highscore for a user without knowing in advance whether a value has already
been stored before for this user, one can use
UPSERT will either insert
a new highscore entry, or update an existing one if already present:
1 2 3 4
If there is already an entry with a key
max, its scores will be increased by 80.
If such entry does not exist, it will be created. In both cases, the new score will
UPSERT command has been added in ArangoDB version 2.6.
Finally, removing an entry from a highscore list is a straight-forward remove operation:
We’ll now build on this simple example and create slightly more advanced highscore list use cases. The following topics will be covered:
- multi-game highscore lists
- joining data
- maintaining a “last updated” date
Multi-game highscore lists
We’ll start with generalizing the single-game highscore list into a multi-game highscore list.
In Redis, one would create multiple sorted sets for handling the highscore lists of multiple games. Multiple Redis sorted sets are stored under different keys, so they are isolated from each other.
Though Redis provides a few commands to aggregate data from multiple sorted sets
ZINTERSTORE) into a new sorted set, other cross-set operations are
not supported. This is not a problem if the client application does not have to
perform cross-set queries or cleanup tasks.
In ArangoDB, multi-game highscore lists can be implemented in two variants.
In order to decide which variant is better suited, we need to be clear about whether all highscores should be stored in the same collection or if we prefer using multiple collections (e.g. one per game).
Storing highscores for different games in separate collections has the advantage that they’re really isolated. It is easy to get rid of a specific highscore list by simply dropping its collection. It is also easy to get right query-wise.
All that needs to be changed to turn the above examples into a multi-game highscore
list solution is to change the hard-coded collection name
highscores and make it a
bind parameter, so the right collection name can be injected by the client application
On the downside, the multi-collection solution will make cross-game operations difficult.
Additionally, having one collection per game may get out of hand when there are many,
many highscore lists to maintain. In case there are many but small highscore lists to
maintain, it might be better to put them into a single collection and add a
attribute to tell the individual lists apart in it.
Let’s focus on this and put all highscores of all games into a single collection.
The first adjustment that needs to be made is that we cannot use
_key for user ids
anymore. This is because user ids may repeat now (a user may be contained in more than
one list). So we will change the design and make the combination of
a new unique key:
1 2 3 4
We can use the unique hash index on
game to ensure there is at most one entry
for per user per game. It also allows use to find out quickly whether we already have
an entry for that particular combination of game and user. Because we are not using
_key we could now also switch to numeric ids if we preferred that.
The other index on
score is sorted. It can be used to quickly retrieve the
leaderboard for a given game. As it is primarily sorted by
game, it can also be used
to enumerate all entries for a given game.
The following Shell command populates the multi-game highscores collection with 55,000 highscores:
1 2 3 4 5 6 7 8 9
The game ids used above are between 0 and 9, though any other game ids would work, too. User ids are stringified numbers.
We can now find out the leaderboard for game 2 with the following adjusted AQL query. The query will use the (sorted) skiplist index:
1 2 3 4 5
Removing all scores for a specific game is also efficient due to the the same index:
1 2 3
On a side note: when storing all highscores in the same collection, we could also
run cross-game queries if we wanted to. All that needs to be done for this is adjusting
FILTER conditions in the queries.
Inserting or updating a user score can be achieved using an
Here’s a query to increase the score of user
"1571" in game
2 by a value of 5:
1 2 3 4
The same index on
[ "user", "game" ] is used in the following query that will
delete the highscore of a given user in a specific game:
1 2 3 4
Querying the leaderboard for a specific game was easy. However, so far we have only queried user ids and associated scores in games. In reality, we probably want to display some more user information in a leaderboard, for example their screen names.
In Redis, no extra information can be stored in sorted sets. So extra user information must be stored under separate keys. There is no concept of joins in Redis. The scores contained in the sorted set need to be queried by the client application, and extra user information have to be queried by the client application separately.
In ArangoDB, we could store the screen names in the highscores collection along with the highscores so we can easily query them with the leaderboard query. This is also how it would be done in MongoDB due to the absence of joins there.
While this would work, it will create lots of redundant data if the screen names are also used and stored elsewhere.
So let’s pick the option that stores highscores and screen names in separate places, and brings them together only when needed in a leaderboard query.
Let’s store screen names in a collection named
users. The following Shell commands
will create the collection and set up 100K users with dummy screen names:
1 2 3 4 5 6 7
We can now query the highscores plus the screen name in one go:
1 2 3 4 5 6 7
Maintaining a “last updated” date
Finally, let’s try to keep track of when a highscore was last updated. There are a few use cases for this, for example displaying the date and time of when a highscore was achieved or for revmoing older highscores.
In Redis, the sorted set values are just the numeric scores, so we cannot store anything else (such as a date) inside the sorted sets. We would really need to store the update date for each highscore entry outside the sorted set, either under a separate key, or using a Redis hash. However, this is complex to manage and keep consistent so we won’t do it.
For implementing the automatic expiration, it would be good if we could use the built-in automatic key expiration of Redis. Each key can optionally be given a time-to-live or an expiration date, and it will automatically expire and vanish then without further ado. This may be exactly what we need to remove older highscore entries, but we cannot use it. The reason is that expiration only works for keys at the top level, but not for individual keys inside a sorted set. So we cannot really implement this sanely.
Let’s switch to ArangoDB now. Here we work with arbitrarily structured documents.
That means we can store any other attributes along with a highscore. We can store the
timestamp of when a highscore was last set or updated in an attribute named
1 2 3 4 5
date attribute can now be used for display purposes already.
We can also use the
date attribute for identifying the oldest entries in the
highscore list in case we want the list to be periodically cleaned up.
Obviously we will be indexing
date for this, but we need to decide whether we want to use
the same expiration periods for all games, or if we want to use game-specific expirations.
If the expiration date is the same for all games, then we can index just
If we now want to remove entries older than roughly 2 days, regardless of the associated game, the removal query looks like this:
1 2 3 4 5
If we instead want to find (and remove) the oldest entries for individual games,
we need to create the index on
This index allows to efficiently get rid of the oldest entries per game:
1 2 3 4 5 6
On a side note: the
REMOVE was limited to the oldest 1000 entries. This
was done to make the query return fast. The removal query can be repeated while
there are still entries to remove.