Last Updated: February 25, 2016
·
12.08K
· banjer

Multiple level term aggregation in elasticsearch

If you're looking to generate a "cross frequency/tabulation" of terms in elasticsearch, you'd go with a nested aggregation.

Here's an example of a three-level aggregation that will produce a "table" of
hostname x login error code x username. This is a query I used to generate a daily report of OpenLDAP login failures.

curl -XGET http://localhost:9200/logstash-*/_search?pretty=true -d '
{
    "aggs" : {
       "hostname_by_login_result": {
          "terms": {
             "field": "hostname.raw"
          },
          "aggs": {
             "result_by_user": {
                "terms": {
                   "field": "login_code",
                   "size": 0,
                   "order": { "_term" : "desc"   }
                },
                "aggs": {
                   "username": {
                      "terms": {
                         "field": "username.raw",
                         "size": 0
                      }
                   }
                }
             }
          }
       }

    }
}
'

By querying the .raw version of a field, you get the "not analyzed" version, which means your data will not be split on delimiters.

I also want the output to be sorted by descending login error code, so hence the order option:

...
                "terms": {
                   "field": "login_code",
                   "size": 0,
                   "order": { "_term" : "desc"   }
                },
...

By default, output is sorted on count of documents returned, or _count. There are a couple of intrinsic sort options available, depending on what type of query you're running.