J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Preview of the UPSERT Command

This week saw the completion of the AQL UPSERT command.

This command will be very helpful in a lot of use cases, including the following:

  • ensure that a document exists
  • update a document if it exists, otherwise create it
  • replace a document if it exists, otherwise create it

The UPSERT command is executed on the server side and so delivers client applications from issuing a fetch command followed by a separate, conditional UPDATE or INSERT command.

The general format of an UPSERT statement is:

UPSERT format
1
2
3
4
UPSERT search-document
INSERT insert-expression
UPDATE update-expression
IN collection-name

Following are a few example invocations of UPSERT.

Ensure a document exists

A simple use case of UPSERT is to ensure that a specific document exists. For example, the following query will ensure that that there will be a document with attribute ip equal to 192.168.173.13 in collection hosts:

ensuring a document exists
1
2
3
4
UPSERT { ip: '192.168.173.13' }
INSERT { ip: '192.168.173.13', name: 'flittard' }
UPDATE { }
IN hosts

If the document does not yet exist, the INSERT part will be carried out. If the document is already there, the empty UPDATE part will be run, which will not modify the document.

After the query has finished, there will be a document with the specified ip value.

There is no need for client applications to check for document existence first, and then to conditionally insert or update, or to try insert first and catch errors.

Note: this is the same as ActiveRecord’s find_or_create.

Update a document if it exists, otherwise create it

Another common use case is to check whether a specific document exists, and then update it. If it does not yet exist, the document shall be created.

Counters are a good example for this, so let’s demo this pattern with a counter, too:

UPSERT example
1
2
3
4
UPSERT { ip: '192.168.173.13' }
INSERT { ip: '192.168.173.13', name: 'flittard', counter: 1 }
UPDATE { counter : OLD.counter + 1 } 
IN hosts

The above query will again look for a document with the specified ip attribute. If the document exists, its counter attribute will be increased by one. This is achieved by referring to the pseudo-value OLD, which in the UPDATE case contains the previous revision of the document.

If the search document does not yet exist, the INSERTpart will be carried out, inserting the document and setting the initial value of counter to 1.

Assuming the collection was empty before, running the above query once will make the collection contain this data:

collection contents after running query once
1
2
3
4
5
6
7
[
  {
    "counter": 1,
    "ip": "192.168.173.13",
    "name": "flittard"
  }
]

When running the UPSERT statement again, the collection will contain the updated document:

collection contents after running the UPSERT command again:
1
2
3
4
5
6
7
[
  {
    "counter": 2,
    "ip": "192.168.173.13",
    "name": "flittard"
  }
]

Now let’s run the query with adjusted ip and name values:

UPSERT with different ip and name
1
2
3
4
UPSERT { ip: '192.168.173.73' }
INSERT { ip: '192.168.173.73', name: 'brueck', counter: 1 }
UPDATE { counter : OLD.counter + 1 } 
IN hosts

After that, the collection will contain two documents:

collection contents
1
2
3
4
5
6
7
8
9
10
11
12
[
  {
    "counter": 2,
    "ip": "192.168.173.13",
    "name": "flittard"
  },
  {
    "counter": 1,
    "name": "brueck",
    "ip": "192.168.173.73"
  }
]

Replace a document if it exists, otherwise create it

We’ve seen UPSERT with an INSERT and an UPDATE clause so far.

UPDATE will partially update the previous revision of the document if present. Only those attributes specified in the update-expression will be updated, and all non-specified attributes of the original document revision will remain unchanged.

If instead a full replacement of the original document is required, the REPLACE clause should be used instead of UPDATE. REPLACE will overwrite the previous revision completely with what’s in update-expression.

UPSERT replacing a document entirely
1
2
3
4
UPSERT { ip: '192.168.173.73' }
INSERT { ip: '192.168.173.73', name: 'brueck', counter: 1 }
REPLACE { location: 'dc1' } 
IN hosts

note: an older version of this blog post contained a wrong example here. Thanks Andy for pointing this out!

Returning documents

UPSERT can be combined with a RETURN statement to return either the previous document revision (in case of UPDATE or REPLACE) or the new version of the document.

Client applications can use this to check whether the UPSERT statement has inserted or updated the document. In case no previous revision was present, the pseudo-value OLD will be null.

UPSERT also provides a pseudo-value named NEW containing the insert, updated or replaced version of the document:

UPSERT with a RETURN value
1
2
3
4
5
UPSERT { ip: '192.168.173.187' }
INSERT { ip: '192.168.173.187', name: 'kalk', counter: 1 }
UPDATE { counter : OLD.counter + 1 } 
IN hosts
RETURN { old: OLD, new: NEW }

In the INSERT case, we’ll get:

query return value for INSERT case
1
2
3
4
5
6
7
8
9
10
[
  {
    "old": null,
    "new": {
      "counter": 1,
      "name": "kalk",
      "ip": "192.168.173.187"
    }
  }
]

When running the query again, we’ll get into the UPDATE case, and the same query will now return:

query return value for the UPDATE case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
  {
    "old": {
      "counter": 1,
      "name": "kalk",
      "ip": "192.168.173.187"
    },
    "new": {
      "counter": 2,
      "name": "kalk",
      "ip": "192.168.173.187"
    }
  }
]

Complex updates

Updating and returning OLD and NEW will work with arbitrary calculations. For example, the following query adds a value development to the tag attribute only if not yet present in the search document:

adding a value to an array if not yet present
1
2
3
4
5
UPSERT { ip: '192.168.173.94' }                                                                              
INSERT { ip: '192.168.173.94', name: 'chorweiler', tags: [ "development" ] }                                                    
UPDATE { tags: PUSH(OLD.tags, "development", true) }                                                                          
IN hosts                                                                                                      
RETURN { old: OLD, new: NEW }

Running the query multiple times will ensure that tags will contain the value development only once.

Note: PUSH is a regular AQL array function.

Restrictions

The search-value needs to be an object literal, with attribute names being inspectable at query compile time. This means that neither variables nor bind parameters can be used for search-value.

However, bind parameters and variables can be used inside search-value. Dynamic attribute names cannot be used for specifying attribute names in search-value`.

UPSERT does not require an index to be present on the attributes of search-value, but in reality queries will benefit from indexes to find matching documents.

When more than one document in the collection matches search-value, one arbitrary match will be used for executing the UPDATE clause. It is therefore recommended to use UPSERT commands together with a unique index or to make sure from the client application that at most one document will match the search-value. Ironically, one way to achieve this is to use the UPSERT command for inserts…

Availability

UPSERT is currently available in the devel branch of ArangoDB. This branch will eventually become release 2.6. Until then, everyone is welcome to try it out and provide feedback.