J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Fulltext Index Enhancements

This post is about improvements for the fulltext index in ArangoDB 2.6. The improvements address the problem that non-string attributes were ignored when fulltext-indexing.

Effectively this prevented string values inside arrays or objects from being indexed. Though this behavior was documented, it was limited the usefulness of the fulltext index much. Several users requested the fulltext index to be able to index arrays and object attributes, too.

Finally this has been accomplished, so the fulltext index in 2.6 supports indexing arrays and objects!

Some history

So far (up to including ArangoDB 2.5) the fulltext indexing in ArangoDB only supported indexing string attributes. Documents in which the index attribute was an array of strings or an object with string member values were excluded from indexing.

This turned out to be limiting, because more complex documents effectively couldn’t be fulltext-indexed.

Here’s an example… Let’s say we had a collection named example with a fulltext index defined on attribute value, set up as follows:

setting up a collection with a fulltext index
1
2
var c = db._create("example");
c.ensureFulltextIndex("value");

Adding a document with a value attribute containing a string value adds all words contained in the string attribute to the fulltext index:

adding a document that is fulltext-indexed
1
c.insert({ value: "Fox is the English translation of the German word Fuchs" });

Now the index can be queried using any of the words:

querying the fulltext index`
1
2
3
4
5
6
7
c.fulltext("value", "fox").toArray();
[
  {
    ...
    "value" : "Fox is the English translation of the German word Fuchs"
  }
]

So far, so good. Now let’s try it with more complex document structures. We’re now using arrays and objects inside the value attribute instead of simple string values:

adding documents that are not fulltext-indexed
1
2
3
4
5
6
7
8
9
10
11
12
c.insert({ value: { en: "fox", de: "Fuchs", fr: "renard", ru: "лиса" } });
c.insert({ value: [ "ArangoDB", "document", "database", "Foxx" ] });
c.insert({ value: [ { name: "ArangoDB", type: "database" }, { name: "Fox", type: "animal" } ] });

c.fulltext("value", "renard").toArray();
[ ]

c.fulltext("value", "ArangoDB").toArray();
[ ]

c.fulltext("value", "database").toArray();
[ ]

Bad luck!

None of the above documents made it into the fulltext index because the index attribute did not contain string values. Though that was documented, it was not the desirable behavior.

2.6

Retrying the same operations in ArangoDB 2.6 changes the picture.

All the above example documents are included in the fulltext index in 2.6. The fulltext index in 2.6 can index string values, object values (it will index the object’s members if they are strings) and array values (it will index the array members if they are strings or objects). Indexing is still limited to one sub-attribute level, so in deeply nested structures only the top level ones will be indexed.

A few example queries on the index in 2.6 prove that now all the data from the more complex documents can be queried:

querying the fulltext index in 2.6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
c.fulltext("value", "renard").toArray();
[
  {
    ...
    "value" : {
      "en" : "fox",
      "de" : "Fuchs",
      "fr" : "renard",
      "ru" : "лиса"
    }
  }
]

c.fulltext("value", "ArangoDB").toArray();
[
  {
    ...
    "value" : [
      "ArangoDB",
      "document",
      "database",
      "Foxx"
    ]
  },
  {
    ...
    "value" : [
      {
        "name" : "ArangoDB",
        "type" : "database"
      },
      {
        "name" : "Fox",
        "type" : "animal"
      }
    ]
  }
]

c.fulltext("value", "database").toArray();
[
  {
    ...
    "value" : [
      "ArangoDB",
      "document",
      "database",
      "Foxx"
    ]
  },
  {
    ...
    "value" : [
      {
        "name" : "ArangoDB",
        "type" : "database"
      },
      {
        "name" : "Fox",
        "type" : "animal"
      }
    ]
  }
]

c.fulltext("value", "лиса").toArray();
[
  {
    ...
    "value" : {
      "en" : "fox",
      "de" : "Fuchs",
      "fr" : "renard",
      "ru" : "лиса"
    }
  }
]

c.fulltext("value", "prefix:Fox").toArray();
[
  {
    ...
    "value" : "Fox is the English translation of the German word Fuchs"
  },
  {
    ...
    "value" : {
      "en" : "fox",
      "de" : "Fuchs",
      "fr" : "renard",
      "ru" : "лиса"
    }
  },
  {
    ...
    "value" : [
      "ArangoDB",
      "document",
      "database",
      "Foxx"
    ]
  },
  {
    ...
    "value" : [
      {
        "name" : "ArangoDB",
        "type" : "database"
      },
      {
        "name" : "Fox",
        "type" : "animal"
      }
    ]
  }
]