This week saw the completion of the AQL UPSERT command.
This command will be very helpful in a lot of use cases, including the following:
- ensure that a document exists
- update a document if it exists, otherwise create it
- replace a document if it exists, otherwise create it
The UPSERT command is executed on the server side and so delivers client
applications from issuing a fetch command followed by a separate, conditional UPDATE
or INSERT command.
The general format of an UPSERT statement is:
1 2 3 4 | |
Following are a few example invocations of UPSERT.
Ensure a document exists
A simple use case of UPSERT is to ensure that a specific document exists.
For example, the following query will ensure that that there will be a document
with attribute ip equal to 192.168.173.13 in collection hosts:
1 2 3 4 | |
If the document does not yet exist, the INSERT part will be carried out. If the
document is already there, the empty UPDATE part will be run, which will not
modify the document.
After the query has finished, there will be a document with the specified ip value.
There is no need for client applications to check for document existence first, and then to conditionally insert or update, or to try insert first and catch errors.
Note: this is the same as ActiveRecord’s find_or_create.
Update a document if it exists, otherwise create it
Another common use case is to check whether a specific document exists, and then update it. If it does not yet exist, the document shall be created.
Counters are a good example for this, so let’s demo this pattern with a counter, too:
1 2 3 4 | |
The above query will again look for a document with the specified ip attribute. If the
document exists, its counter attribute will be increased by one. This is achieved by
referring to the pseudo-value OLD, which in the UPDATE case contains the previous revision
of the document.
If the search document does not yet exist, the INSERTpart will be carried out, inserting
the document and setting the initial value of counter to 1.
Assuming the collection was empty before, running the above query once will make the collection contain this data:
1 2 3 4 5 6 7 | |
When running the UPSERT statement again, the collection will contain the updated document:
1 2 3 4 5 6 7 | |
Now let’s run the query with adjusted ip and name values:
1 2 3 4 | |
After that, the collection will contain two documents:
1 2 3 4 5 6 7 8 9 10 11 12 | |
Replace a document if it exists, otherwise create it
We’ve seen UPSERT with an INSERT and an UPDATE clause so far.
UPDATE will partially update the previous revision of the document if present.
Only those attributes specified in the update-expression will be updated, and
all non-specified attributes of the original document revision will remain
unchanged.
If instead a full replacement of the original document is required, the REPLACE
clause should be used instead of UPDATE. REPLACE will overwrite the previous
revision completely with what’s in update-expression.
1 2 3 4 | |
note: an older version of this blog post contained a wrong example here. Thanks Andy for pointing this out!
Returning documents
UPSERT can be combined with a RETURN statement to return either the previous
document revision (in case of UPDATE or REPLACE) or the new version of the
document.
Client applications can use this to check whether the UPSERT statement has
inserted or updated the document. In case no previous revision was present, the
pseudo-value OLD will be null.
UPSERT also provides a pseudo-value named NEW containing the insert, updated or
replaced version of the document:
1 2 3 4 5 | |
In the INSERT case, we’ll get:
1 2 3 4 5 6 7 8 9 10 | |
When running the query again, we’ll get into the UPDATE case, and the same query
will now return:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Complex updates
Updating and returning OLD and NEW will work with arbitrary calculations.
For example, the following query adds a value development to the tag attribute
only if not yet present in the search document:
1 2 3 4 5 | |
Running the query multiple times will ensure that tags will contain the value
development only once.
Note: PUSH is a regular AQL array function.
Restrictions
The search-value needs to be an object literal, with attribute names being inspectable
at query compile time. This means that neither variables nor bind parameters can be used
for search-value.
However, bind parameters and variables can be used inside search-value.
Dynamic attribute names cannot be used for specifying attribute names in search-value`.
UPSERT does not require an index to be present on the attributes of search-value,
but in reality queries will benefit from indexes to find matching documents.
When more than one document in the collection matches search-value, one arbitrary match
will be used for executing the UPDATE clause. It is therefore recommended to use
UPSERT commands together with a unique index or to make sure from the client application
that at most one document will match the search-value. Ironically, one way to achieve this is
to use the UPSERT command for inserts…
Availability
UPSERT is currently available in the devel branch of ArangoDB. This branch
will eventually become release 2.6. Until then, everyone is welcome to try it out and
provide feedback.