This post tries to explain that in detail.
The C++ parts
The I/O handling in arangod is written in C++ and uses libev (written in C) for the low-level event handling. All the socket I/O, working scheduling and queueing is written in C++, too. These are parts that require high parallelism, so we want this to run in multiple threads.
All the indexes, the persistence layer and many of the fundamental operations, like the ones for document inserts, updates, deletes, imports are written in C++ for effective control of memory usage and parallelism. AQL’s query parser is written using the usual combination of Flex and Bison, which generate C files that are compiled to native code. The AQL optimizer, AQL executor and many AQL functions are writting in C++ as well.
How ArangoDB uses V8
On the server side, things are a bit different. In arangod, there are multiple
V8 isolates. The number of isolates to create is a startup configuration
has multiple I/O threads (
--scheduler.threads configuration option) for handling
the communication with client applications.
Inside arangod, each V8 isolate is equipped with a global variable named
db object is created, we tell V8 that its methods are C++ callbacks.
call C++ methods. These provide full access to the server internals, can do whatever
is required and return data in the format that V8 requires. V8 then makes the
db.myCollection.save(...) is effectively two operations: accessing the
myCollection on the object
db and then calling function
save on that
property. For the first operation, V8 will invoke the object’s
which is a C++ function that is responsible for returning the value for the property
with the given name (
myCollection). In the case of
db, we have a C++ function
that collection object if it exists, or
undefined if not.
The collection object again has C++ bindings in the background, so calling function
save on it will call another C++ function. The collection object also has a (hidden)
pointer to the C++ collection. When
save is called, we will extract that pointer
this object so we know which C++ data structures to work on. The
function will also get the to-be-inserted document data as its payload. V8 will
pass this to the C++ function as well so we can validate it and convert it into
our internal data format.
Accessing server internals from ArangoShell
When running the same command in arangosh, things will be completely different. The ArangoShell may run on the same host as the arangod server process, but it may also run on a completely different one. Providing arangosh access to server internals such as pointers will therefore not work in general. Even if arangosh and arangod do run on the same host, they are independent processes with no access to the each other’s data. The latter problem could be solved by having a shared memory segment that both arangosh and arangod can use, but why bother with that special case which will provide no help in the general case when the shell can be located on any host.
To make the shell work in all these situations, it uses the HTTP REST API provided by the ArangoDB server to talk to it. For arangod, any ArangoShell client is just another client, with no special treatments or protocols.
Recalling the command example again (
db.myCollection.save(...)), the shell will first
access the property
myCollection of the object
db. In the shell
db is a regular
HTTP call to arangod to retrieve a list of all available collections, and register
them as properties in its
db object. Calling the
save method on one of these
objects will trigger an HTTP POST request to the server API at
with the to-be-inserted data in its request body. Eventually the server will respond
and the command will return with the data retrieved from the server.
1 2 3
When run from inside the ArangoShell, the code will be executed in there. The shell will
perform an HTTP request to arangod for each call to
save. We’ll end up with 1,000
Running the same code inside arangod will trigger no HTTP requests, as the server-side functions are backed with C++ internals and can access the database data directly. It will be a lot faster to run this loop on the server than in arangosh. A while ago I wrote another article about this.
When replacing the ArangoShell with another client application, things are no different. A client application will not have access to the server internals, so all it can do is to make requests to the server (by the way, the principle would be no different if we used MySQL or other database servers, only the protocols would vary).
Fortunately, there is a fix for this: making the code run server-side. For example, the above code can be put into a Foxx route. This way it is not only fast but will be made accessible via an HTTP REST API so client applications can call it with a single HTTP request.
The differences to node.js
To start with: ArangoDB is not node.js, and vice versa. ArangoDB is not a node.js module either. ArangoDB and node.js are completely indepedent.
AFAIK, standard node.js only has a single V8 isolate to run all code in. While that made the implementation easier (no hassle with multi-threading) it also limits node.js to using only a single CPU.
And because a node.js instance is single-threaded, it is also important that code written for node.js is non-blocking. Code that blocks while waiting for some I/O operation would block the only available CPU. Using non-blocking I/O operations allows node.js to queue the operation, and execute other code in the meantime, allowing overall progress. This also makes it look like it would be executing multiple actions in parallel, while it is actually executing them sequentially.
arangosh, the ArangoShell, is single-threaded and provides only a single V8 isolate.
Usage of modules
Both node.js and ArangoDB can load code at runtime so it can be organized into
There is often confusion about whether node.js modules can be used in ArangoDB. This is probably because the answer is “it depends!”.
ArangoDB also uses several other modules that are maintained on npm.js.
An example module is AQB, a query builder for AQL.
from within ArangoDB. If there is an updated version of this module, we use npm to
install it in a subdirectory of ArangoDB. As per npm convention, the node.js modules
shipped with ArangoDB reside in a directory named
node_modules. Probably this is
what caused some of the confusion.