ArangoDB provides many options for finding and modifying data. Though there are several more specialized operation, data-modification AQL queries are the most general solution in ArangoDB. They allow to find documents using arbitrary filter criteria, and to modify or remove the documents once found.
Such find-and-modify operations can be executed with multiple queries (one for the find step, one for the modification step), or with a single query. Putting both steps into a single query will often save roundtrips between the application and the database and thus may be preferred over executing the steps separately. Putting both the find and the modify step into the same query also prevents other operations from interfering in between and tampering with the underlying data.
Now what if the application not only requires the data to be updated, but also needs to keep track of which documents were found and modified by a find-and-modify query? This is often required when an application needs to keep database data in sync with data in some other datastore (e.g. the filesystem or a remote service).
The pattern I would dub find-modify-return would be useful for this.
Unfortunately it hasn’t been supported in AQL until very recently. We have been asked for this so many times that I stopped counting.
I am glad that this got better with ArangoDB 2.4. Finding, modifying and returning documents is now possible from the same AQL query. The solution is not yet perfect, but at least it provides basic find-modify-return functionality for a lot of use cases, including multi-document, multi-collection queries.
INSERT
Let’s start with an INSERT
operation as it is the most simple to explain:
1 2 |
|
The above query does not explicitly specify the _key
attribute for the inserted
documents, meaning the database will create the documents keys automatically.
Additionally, the query assigns a random value to an attribute someValue
.
Getting to know the created keys or random values required an extra lookup query
before 2.4. Extra queries are something one wants to avoid for efficiency
reasons.
With 2.4, adding two lines at the end of the query will solve the problem so that everything can be done in one query:
1 2 3 4 |
|
Using a LET
with the pseudo-value of NEW
after the INSERT
together with
a final RETURN
statement will make the inserted documents appear in the query
result. Note that the full documents will be returned and not just the specified
attributes. This allows the application to track of all document attributes,
even the auto-generated ones.
REMOVE
The mechanism works for REMOVE
statements, too.
Time to use a different example for this. Let’s assume session metadata are stored in the database, and some bigger session files are stored somewhere in the filesystem. If the application needs to remove expired sessions, it will need to clean up in both places. It will first query the database to find the expired sessions, only to remove them from the filesystem (if the filesystem sessions were organized by session id from the database) and to finally remove the sessions from database itself.
The following query could be used to find removal candidates:
1 2 3 4 |
|
The results of this query can be used for cleaning up sessions in the filesystem,
and for finally removing the sessions from the database. However, this would require
an extra REMOVE
query. And didn’t we say we would like to avoid extra queries?
The good news is that in 2.4 we can by putting at least the database part into a single query:
1 2 3 4 5 6 |
|
The last two lines of the above query make sure the removed sessions are returned to the application, so the application can perform any filesystem cleanup using the session ids from the database.
Note that in case of REMOVE
one has to use the pseudo-value OLD
because
REMOVE
can only return documents before removal. In the case of INSERT
we can only refer to the pseudo-value NEW
.
UPDATE and REPLACE
So far we saw INSERT
and REMOVE
, but there are also UPDATE
and REPLACE
.
I will handle UPDATE
and REPLACE
in one go. All the following refers to
UPDATE
, but does apply to REPLACE
as well.
The mechanism to return documents from a query is the same as
already demonstrated: simply append the LET ... RETURN
sequence to the end
of the original query.
In addition, UPDATE
allows to return either the old document revisions
(before modification) or the new document revisions (after modification).
This can be expressed by using either OLD
or NEW
in the final LET
statement.
Following is a more complex example for an UPDATE
query that aggregates data
from one collection (phraseOccurrences
) in order to find and modify matching
documents in another collection (phrases
). It will return the documents from
phrases
before they got modified:
1 2 3 4 5 6 7 8 9 |
|
If we are interested in what the documents in phrases
look like with the
UPDATE
applied, we can use NEW
instead:
1 2 3 4 5 6 7 8 9 |
|
Note that the full documents will be returned here, and not just the attributes
specified in or modified by the UPDATE
operation.