Using the elasticsearch Profile API

-

Some time a go I found an issue for elasticsearch explaining a Profile API that they were working on. The recently released 2.2 version of elasticsearch contains the first experimental version of the API. The API is still experimental, meaning that it could change or be removed completely in the future.

Having given you the warning let us have a look what we can do with the profile API. In this blogpost I give you an overview of the capabilities with a very basic example. After this blog you should have a good idea what you can do with the profile API.

The sample documents

Before we can start, we need some sample documents. Below you can find the documents that I used together with the mapping. In order to make the output easier, I configure just one shard with no replica’s and a document with the standard analysed string field together with two alternatives: keyword analyser with lowercase filter and raw.

PUT /companies
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 1,
    "analysis": {
      "analyzer": {
        "askeyword": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "company": {
      "properties": {
        "name": {
          "type": "string", 
          "index": "analyzed",
          "fields": {
            "asone": {
              "type": "string", 
              "analyzer": "askeyword"
            },
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }
}

Below the documents that we insert.

PUT /companies/company/1
{
  "name": "Luminis Amsterdam"
}
PUT /companies/company/2
{
  "name": "Elastic"
}
PUT /companies/company/3
{
  "name": "ServiceHouse"
}
PUT /companies/company/4
{
  "name": "Bol.com"
}
PUT /companies/company/5
{
  "name": "ANWB"
}

Introduction

You can enable the Profile API results using the top level property profile, set it to true to enable obtaining profile results. The profile results are calculated for each shard. So, if you are searching over multiple indices, you are also searching over multiple shards. In the results a shard is identified by the unique identifier of the node, the name of the index and the number of the shard. For an example, check the id field in the next code block which shows part of the response. But first the request with a very basic query.

GET /companies/_search
{
  "profile": true,
  "query": {
    "match": {
      "name": "Amsterdam Luminis"
    }
  }
}

"profile": {
  "shards": [
    {
      "id": "[FZPSGvI8RYOGf4pzEcfU5A][companies][0]",
      "searches": [
        {
          "query": [],
          "rewrite_time": 2914,
          "collector": []
        }
      ]
    }
  ]
}

At the moment the result consists of a shards array with each shard consisting of an id according to the mentioned format and an array of searches. This search array usually consists of one item. The item contains 3 elements: The query results, the rewrite time and the collector giving more information about how Lucene handled the query including the time it took for lucene. In the next sections we take a deeper dive into the different elements of the response

Query Section

Queries can have a pretty complicated structure. Just like with the explain API if you break it down, it is a repetition of the same elements. The basic building blocks are:

  • query_type: for instance a TermQuery
  • lucene: The query part in Lucene style, for example message:lucene
  • time: number of ms it took Lucene to execute the query
  • breakdown: Has more detailed information about the query
  • children: This is the repetition, more of the same kind of blocks

The query type is the query type as created based on the original query, but often not the same as the original query. For example, a match query with multiple terms will become a bool query consisting of a number of Term queries. Check the following example:

query
  query_type: BooleanQuery
  lucene: name:luminis name:amsterdam
  time: 0.3146980000ms
  breakdown: 
  children:
    query_type: TermQuery
    lucene: name:luminis
    time: 0.1017570000ms
    breakdown:
 
    query_type: TermQuery
    lucene: name:amsterdam
    time: 0.02729200000ms
    breakdown:

The example shows the query_type, being BooleanQuery and TermQuery, it also shows the Lucene query that is executed and the time it took in milliseconds. The final piece is the breakdown part. This piece contains more detailed information about what took so long. Check the docs in the reference section to learn more about this very advanced statistic.

Rewrite section

As described in the query section, the query is rewritten. In the example a match query is rewritten as a bool query containing two term queries. This section just returns the amount of milliseconds it took to rewrite the query.

Collectors section

Collectors are Lucene’s mechanism to keep track of all other activities. Each collector has a name and a keyword description of why the specific collector was used. Finally the collector shows the time the collection took. A few example collectors are.

  • SimpleTopScoreDocCollector: Used when asking for the documents with the highest score used as sorting
  • SimpleFieldCollector: Used when returning documents sorted by their name
  • TotalHitCountCollector: Used when nu documents are requested, but only the count (when size:0 is used)
  • MultiCollector: Used when a query is used in combination with a aggregation for instance.
  • GlobalOrdinalsStringTermsAggregator: Used when a Terms aggregation is requested.

There are more types than mentioned here, they are in the documentation. The next code block shows a sample output extracted from the Profile API result.

"collector": [
  {
    "name": "SimpleTopScoreDocCollector",
    "reason": "search_top_hits",
    "time": "0.02907200000ms"
  }
]

Concluding

To my opinion the Profile API is a very nice addition to the available API’s. It enables you to learn more about what is actually happening when executing searches in elasticsearch. It is easy to use and not to hard to understand. As a warning though, using the profile API is not cheap and should not be used by default in production. Also, not everything is supported yet by the Profile API, suggestions for instance are not supported. At the moment there is no GUI for the Profile API, but elastic is working on a Kibana plugin to help you visualise the Profile results. Maybe I’ll introduce it in my own GUI project as well, need to think about that.

Reference

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-profile.html