J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Diffing Two Documents in AQL

I just stumbled upon a comment in the ArangoDB blog asking how to create a diff of two documents with AQL.

Though there is no built-in AQL function to diff two documents, it is easily possible to build your own like in the following query.

AQL code for diffing two documents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
/* input document 1*/
LET doc1 = { 
  "foo" : "bar", 
  "a" : 1, 
  "b" : 2 
}

/* input document 2 */
LET doc2 = { 
  "foo" : "baz", 
  "a" : 2, 
  "c" : 3 
}

/* collect attributes present in doc1, but missing in doc2 */
LET missing = (
  FOR key IN ATTRIBUTES(doc1)
    FILTER ! HAS(doc2, key)
    RETURN {
      [ key ]: doc1[key]
    }
)

/* collect attributes present in both docs, but that have different values */
LET changed = (
  FOR key IN ATTRIBUTES(doc1)
    FILTER HAS(doc2, key) && doc1[key] != doc2[key]
    RETURN {
      [ key ] : {
        old: doc1[key],
        new: doc2[key]
      }
    }
)

/* collect attributes present in doc2, but missing in doc1 */
LET added = (
  FOR key IN ATTRIBUTES(doc2)
    FILTER ! HAS(doc1, key)
    RETURN {
      [ key ] : doc2[key]
    }
)

/* return final result */
RETURN {
  "missing" : missing,
  "changed" : changed,
  "added" : added
}

Note: the query may look a bit lengthy, but much of that is due to formatting. A more terse version can be found below.

The above query will return a document with three attributes:

  • missing: contains all attributes only present in first document (i.e. missing in second document)
  • changed: contains all attributes present in both documents that have different values
  • added: contains all attributes only present in second document (i.e. missing in first document)

For the two example documents it will return:

diff query result
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[
  {
    "missing" : [
      {
        "b" : 2
      }
    ],
    "changed" : [
      {
        "foo" : {
          "old" : "bar",
          "new" : "baz"
        }
      },
      {
        "a" : {
          "old" : 1,
          "new" : 2
        }
      }
    ],
    "added" : [
      {
        "c" : 3
      }
    ]
  }
]

That output format was the first that came to my mind. It is of course possible to adjust the query so it produces a different output format.

Following is a version of the same query that can be invoked from JavaScript easily. It passes the two documents as bind parameters and calls db._query. The query is now a one-liner (less readable but easier to copy&paste):

1
2
3
4
5
6
7
8
bindVariables = {
  doc1 : { "foo" : "bar", "a" : 1, "b" : 2 },
  doc2 : { "foo" : "baz", "a" : 2, "c" : 3 }
};

query = "LET doc1 = @doc1, doc2 = @doc2, missing = (FOR key IN ATTRIBUTES(doc1) FILTER ! HAS(doc2, key) RETURN { [ key ]: doc1[key] }), changed = (FOR key IN ATTRIBUTES(doc1) FILTER HAS(doc2, key) && doc1[key] != doc2[key] RETURN { [ key ] : { old: doc1[key], new: doc2[key] } }), added = (FOR key IN ATTRIBUTES(doc2) FILTER ! HAS(doc1, key) RETURN { [ key ] : doc2[key] }) RETURN { missing : missing, changed : changed, added : added }";

result = db._query(query, bindVariables).toArray();