This post is about improvements for the fulltext index in ArangoDB 2.6. The improvements address the problem that non-string attributes were ignored when fulltext-indexing.
Effectively this prevented string values inside arrays or objects from being indexed. Though this behavior was documented, it was limited the usefulness of the fulltext index much. Several users requested the fulltext index to be able to index arrays and object attributes, too.
Finally this has been accomplished, so the fulltext index in 2.6 supports indexing arrays and objects!
Some history
So far (up to including ArangoDB 2.5) the fulltext indexing in ArangoDB only supported indexing string attributes. Documents in which the index attribute was an array of strings or an object with string member values were excluded from indexing.
This turned out to be limiting, because more complex documents effectively couldn’t be fulltext-indexed.
Here’s an example… Let’s say we had a collection named example
with a fulltext index
defined on attribute value
, set up as follows:
1 2 |
|
Adding a document with a value
attribute containing a string value adds all words contained
in the string attribute to the fulltext index:
1
|
|
Now the index can be queried using any of the words:
1 2 3 4 5 6 7 |
|
So far, so good. Now let’s try it with more complex document structures.
We’re now using arrays and objects inside the value
attribute instead of simple
string values:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Bad luck!
None of the above documents made it into the fulltext index because the index attribute did not contain string values. Though that was documented, it was not the desirable behavior.
2.6
Retrying the same operations in ArangoDB 2.6 changes the picture.
All the above example documents are included in the fulltext index in 2.6. The fulltext index in 2.6 can index string values, object values (it will index the object’s members if they are strings) and array values (it will index the array members if they are strings or objects). Indexing is still limited to one sub-attribute level, so in deeply nested structures only the top level ones will be indexed.
A few example queries on the index in 2.6 prove that now all the data from the more complex documents can be queried:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|