J@ArangoDB

{ "subject" : "ArangoDB", "tags": [ "multi-model", "nosql", "database" ] }

Using ArangoDB as a Logstash Output

Inspired by a question on StackOverflow, I did some investigation about how to make Logstash send log events to ArangoDB.

There is no dedicated Logstash output plugin for ArangoDB on the Logstash plugins page, so I had already accepted to write one on my own.

Browsing the plugins page for inspiration, I found an HTTP output plugin for Logstash. It seems to be general enough that it can send the log event in JSON format to any HTTP-speaking backend.

ArangoDB’s API is JSON over HTTP, so it sounded like a perfect match. I briefly tried it out and it seemed to work fine.

Here are the steps I carried out to connect the two:

Prepare an ArangoDB server

I started an ArangoDB server with default configuration (binding to IP address 127.0.0.1 and port 8529). I then used the ArangoShell to create a collection named logstash:

1
db._create("logstash");

This collection will be used for storing the log events sent by Logstash.

Download Logstash

In order to run Logstash, you must have Java installed, which I assume you already have.

Now it’s time to download Logstash. You can download and unpack it with the command following. The current version is 1.5.0 beta1 (warning: 100 MB download!):

1
2
3
wget "http://download.elasticsearch.org/logstash/logstash/logstash-1.5.0.beta1.tar.gz"
tar xvfz logstash-1.5.0.beta1.tar.gz
cd Downloads/logstash-1.5.0.beta1

Connecting Logstash with ArangoDB

We are now ready to start Logstash. I’ll start it in a mode that will send all input from stdin as log events to ArangoDB. I am using the stdin input plugin, and the http output plugin for this. The http output plugin needs to know the URL to send the log events to.

The URL is ArangoDB’s base URL plus the REST API method for storing a single document, with the name of the target collection (logstash) appended.

Here is the full command:

1
bin/logstash -e 'input { stdin } } output { http { http_method => "post" url => "http://127.0.0.1:8529/_api/document?collection=logstash" format => "json" } }'

Logstash may need a few seconds to start. The HTTP plugin will print a message about itself being a milestone 1 release only, but it works. Anything entered in the terminal should now be sent as a log event to ArangoDB.

For example, type fingers crossed! and hit enter:

1
2
Using milestone 1 output plugin 'http'. This plugin should work, but would benefit from use by folks like you. Please let us know if you find bugs or have suggestions on how to improve this plugin.  For more information on plugin milestones, see http://logstash.net/docs/1.5.0.beta1/plugin-milestones {:level=>:warn}
fingers crossed!

Let’s check if the log event made it into ArangoDB. I have used the ArangoShell for this:

1
2
3
4
5
6
7
8
9
10
11
12
db.logstash.toArray()
[
  {
    "_id" : "logstash/3507690866496",
    "_key" : "3507690866496",
    "_rev" : "3507690866496",
    "@version" : "1",
    "host" : "kalk",
    "message" : "fingers crossed!",
    "@timestamp" : "2015-02-05T23:17:39.982Z"
  }
]

Querying log events

So we’re getting log events in from Logstash.

We can use AQL to query the received log events in ArangoDB. But before we run a query, we probably want to index the @timestamp attribute of the events, so we can efficiently find and filter them by date and time:

1
db.logstash.ensureSkiplist("@timestamp");

Now we can run the following AQL query to find the latest 5 log events:

1
2
3
4
5
FOR l IN logstash 
  FILTER l.`@timestamp` <= '2099' /* arbitrary max value */ 
  SORT l.`@timestamp` DESC 
  LIMIT 5 
  RETURN l

Note: the @timestamp attribute name needs to be enclosed in backticks because a @ prefix is used to designate bind parameters in AQL. Enclosing the names in backticks will make AQL treat them as attribute name literals.

For the simple types of events triggered by the stdin input plugin, this is already sufficient. However, log events may look different, depending on the type of input plugins that are used. For other inputs, other attributes may need to be indexed, too.

Adjusting IP, port and authentication

Above I have used the default configuration of ArangoDB, that is IP 127.0.0.1, port 8529, and no authentication. You probably want to change this.

To make ArangoDB listen on any other IP address or port, change the endpoint setting in its configuration file /etc/arangod.conf. You may also want to set the disable-authentication flag to false, meaning authentication is turned on.

1
2
3
[server]
endpoint = tcp://192.168.173.13:9999
disable-authentication = false

Before activating the new configuration, let’s create a dedicated ArangoDB user logstash. I will also change the default password of the root user. The following ArangoShell commands do this:

1
2
require("org/arangodb/users").save("logstash", "secret-logging", true);
require("org/arangodb/users").save("root", "nobody-will-ever-guess", true);

To make logstash use the above settings, we have to adjust the command-line:

1
bin/logstash -e 'input { stdin } } output { http { http_method => "post" url => "http://logstash:secret-logging@192.168.173.13:9999/_api/document?collection=logstash" format => "json" } }'

Pitfalls

Though Logstash itself can write a logfile (--log option) and can provide debug information (--debug), I did not get it log or print errors when misconfiguring the HTTP output plugin. For example, specifying a wrong target URL will make all HTTP requests from Logstash to ArangoDB silently fail, with the log events being lost if not stored elsewhere.

Maybe this is configurable somewhere, but then I didn’t find it. It is also possible that this will be fixed in some future release.

Disclaimer

Please feel free to use this blog as a starting point but not as an endorsement.

Though I think it will work perfectly, I am not at all an expert for Logstash or its plugins. I didn’t spend much time with it yet, and I may have overlooked important things. So should you be interested in using it, please conduct your own tests first.