J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Returning Results From AQL INSERT/REMOVE/REPLACE Operations

ArangoDB provides many options for finding and modifying data. Though there are several more specialized operation, data-modification AQL queries are the most general solution in ArangoDB. They allow to find documents using arbitrary filter criteria, and to modify or remove the documents once found.

Such find-and-modify operations can be executed with multiple queries (one for the find step, one for the modification step), or with a single query. Putting both steps into a single query will often save roundtrips between the application and the database and thus may be preferred over executing the steps separately. Putting both the find and the modify step into the same query also prevents other operations from interfering in between and tampering with the underlying data.

Now what if the application not only requires the data to be updated, but also needs to keep track of which documents were found and modified by a find-and-modify query? This is often required when an application needs to keep database data in sync with data in some other datastore (e.g. the filesystem or a remote service).

The pattern I would dub find-modify-return would be useful for this.

Unfortunately it hasn’t been supported in AQL until very recently. We have been asked for this so many times that I stopped counting.

I am glad that this got better with ArangoDB 2.4. Finding, modifying and returning documents is now possible from the same AQL query. The solution is not yet perfect, but at least it provides basic find-modify-return functionality for a lot of use cases, including multi-document, multi-collection queries.

INSERT

Let’s start with an INSERT operation as it is the most simple to explain:

insert query, not returning anything
1
2
FOR i IN 1..10
  INSERT { name: CONCAT("user", i), someValue: RAND() } IN testAccounts

The above query does not explicitly specify the _key attribute for the inserted documents, meaning the database will create the documents keys automatically. Additionally, the query assigns a random value to an attribute someValue. Getting to know the created keys or random values required an extra lookup query before 2.4. Extra queries are something one wants to avoid for efficiency reasons.

With 2.4, adding two lines at the end of the query will solve the problem so that everything can be done in one query:

insert query, returning the inserted documents
1
2
3
4
FOR i IN 1..10
  INSERT { name: CONCAT("user", i), someValue: RAND() } IN testAccounts
  LET inserted = NEW
  RETURN inserted

Using a LET with the pseudo-value of NEW after the INSERT together with a final RETURN statement will make the inserted documents appear in the query result. Note that the full documents will be returned and not just the specified attributes. This allows the application to track of all document attributes, even the auto-generated ones.

REMOVE

The mechanism works for REMOVE statements, too.

Time to use a different example for this. Let’s assume session metadata are stored in the database, and some bigger session files are stored somewhere in the filesystem. If the application needs to remove expired sessions, it will need to clean up in both places. It will first query the database to find the expired sessions, only to remove them from the filesystem (if the filesystem sessions were organized by session id from the database) and to finally remove the sessions from database itself.

The following query could be used to find removal candidates:

query for finding removal candidates
1
2
3
4
FOR session IN sessions
  FILTER session.dateExpires < DATE_NOW()
  LIMIT 1000
  RETURN session._key

The results of this query can be used for cleaning up sessions in the filesystem, and for finally removing the sessions from the database. However, this would require an extra REMOVE query. And didn’t we say we would like to avoid extra queries?

The good news is that in 2.4 we can by putting at least the database part into a single query:

query for removing and returning expired sessions
1
2
3
4
5
6
FOR session IN sessions
  FILTER session.dateExpires < DATE_NOW()
  LIMIT 1000
  REMOVE session IN sessions 
  LET removed = OLD
  RETURN removed

The last two lines of the above query make sure the removed sessions are returned to the application, so the application can perform any filesystem cleanup using the session ids from the database.

Note that in case of REMOVE one has to use the pseudo-value OLD because REMOVE can only return documents before removal. In the case of INSERT we can only refer to the pseudo-value NEW.

UPDATE and REPLACE

So far we saw INSERT and REMOVE, but there are also UPDATE and REPLACE. I will handle UPDATE and REPLACE in one go. All the following refers to UPDATE, but does apply to REPLACE as well.

The mechanism to return documents from a query is the same as already demonstrated: simply append the LET ... RETURN sequence to the end of the original query.

In addition, UPDATE allows to return either the old document revisions (before modification) or the new document revisions (after modification). This can be expressed by using either OLD or NEW in the final LET statement.

Following is a more complex example for an UPDATE query that aggregates data from one collection (phraseOccurrences) in order to find and modify matching documents in another collection (phrases). It will return the documents from phrases before they got modified:

update query returning “old” documents
1
2
3
4
5
6
7
8
9
FOR po IN phraseOccurrences
  FILTER ! po.isSuspicious
  FILTER po.dateMentioned >= '2015-01-01' && po.dateMentioned <= '2015-01-09'
  COLLECT phrase = po.phrase WITH COUNT INTO occurrences
  FILTER occurrences % 42 != 23
  LIMIT 13
  UPDATE phrase WITH { isSuspicious: true } IN phrases
  LET previous = OLD  /* returns document revisions before UPDATE */
  RETURN previous

If we are interested in what the documents in phrases look like with the UPDATE applied, we can use NEW instead:

update query returning “new” documents
1
2
3
4
5
6
7
8
9
FOR po IN phraseOccurrences
  FILTER ! po.isSuspicious
  FILTER po.dateMentioned >= '2015-01-01' && po.dateMentioned <= '2015-01-09'
  COLLECT phrase = po.phrase WITH COUNT INTO occurrences
  FILTER occurrences % 42 != 23
  LIMIT 13
  UPDATE phrase WITH { isSuspicious: true } IN phrases
  LET modified = NEW  /* returns document revisions after UPDATE */
  RETURN modified

Note that the full documents will be returned here, and not just the attributes specified in or modified by the UPDATE operation.