The new Elasticsearch 7: Why I think you should upgrade

-

Elasticsearch 7 has been released for almost two months now and Elastic claims it’s the fastest, safest, most resilient, easiest to use version of Elasticsearch ever. After attending the Elastic Stack 7 Highlight meetup from Elastic at their HQ three weeks ago, I got excited about the improvements in this version and wanted to dive a bit deeper into them. This blog post is about my opinion on why you should upgrade to version 7.

Elasticsearch 7.0 Release NotesElasticsearch 7.1 Release Notes

Some security features moved to basic license in 7.1

While writing this blog Elastic has released version 7.1. In this version they moved some security features to the basic license. TLS (Transport Layer Security), which encrypts communication between Elasticsearch nodes, and RBAC (Role Based Access Control) to configure users, groups, roles and permissions (at index-level), are now free to use.

Scalability and resiliency

First I’ll start with the scalability and resiliency related improvements in this version for which I think the upgrade is worth it.

New cluster coordination system

The most important improvement (as said by the Elastic crew) is the new cluster coordination system. In previous versions Elasticsearch relied on Zen Discovery for scaling and resiliency to failures. The minimum_master_nodes setting was an important piece to prevent split brains and losing data. The best practice for this setting is to set it to N/2+1, were N is the amount of master-eligible nodes. Unfortunately this setting is often misconfigured and maintaining this setting across large and dynamically resizing clusters is also difficult. In Elasticsearch 7, they have completely rewritten the system, now called Zen2. The minimum_master_nodes setting is now removed and Elasticsearch now chooses which nodes can form a quorum by itself. With Zen2, master elections will now take milliseconds compared to the old Zen, where it could take several seconds. Growing and shrinking cluster now becomes safer and easier with much less room to misconfigure the system. Elastic has written a nice blog about this new system if you want to read more about it.

Primary shard default

Previously the default value of primary shards for indices was set to five. This caused performance problems to a lot of users due to oversharding. Because shards are still performant to a size of up to 30-40 GB, the five shard default is an overkill in a lot of use-cases. In version 7 the default is set to one primary shard per index. The shard count can of course still be altered via the index settings.

Less heap

In the newest version some improvements were made for memory usage and protection against out-of-memory errors. The first thing is the new circuit breaker that keeps track of the memory used by the JVM. It uses a functionality in the JVM to measure current memory usage instead of only accounting for the currently tracked memory. A node will now reject requests if they exceed a threshold of (default) 95% of the heap allocated to the process. This will prevent OOM errors. The node will respond with a HTTP 429 error with information about the current memory usage and the amount of memory needed by the request. The .NET, Ruby, Python and Java clients already implement retry policies to deal with this response. Another change made is the default maximum buckets to return as part of an aggregation which was unbounded by default in previous versions. It is now set to 10,000. This will prevent Elasticsearch to calculate a large number of buckets which could cost a lot of memory. Another feature to reduce memory usage is making an index “frozen”. Frozen indices were already implemented in 6.6, but was also explained in detail at the Elastic Stack 7 Highlights meetup. In short: open indices always keep some data in memory to efficiently search and index to them. If you have indices which you access very rarely, but don’t want to close them, it might be handy to freeze them. This will free up the resources used to keep them open. This data has to be rebuilt each time you want to search these indices, which makes searches on them slower.

Optimisations for performance

Next up some important optimisations implemented to improve search performance.

Faster retrieval of top hits

A breaking change has been made to the way Elasticsearch calculates the total hits. The default number of top results that are scored is now set to 10.000. What it means is that Elasticsearch will now, by default, only calculates scores for the top 10.000 hits. In case of more than 10.000 hits it says the total hits amount is greater than 10.000. This prevents Elasticsearch from going through all hits to calculate an exact amount, which makes the retrieval of top hits faster. It’s possible to change the default value to a different amount or still calculate an exact amount with the “track_total_hits” property in a search request. Read more about it in the documentation.

Easier relevance tuning with rank feature

To make relevance tuning easier, Elastic has implemented the rank_feature and rank_features datatypes and the rank_feature query. The rank_feature datatype holds a numeric value which can be used in a rank_feature query to boost documents. This is useful when you want to, for example, boost documents on popularity or importance. The rank_features field holds numeric feature vectors which works better for a set of weighted tags or categories. These fields can only be used by the rank_feature query and this type of query only works on these fields. The query is typically put in a should clause of a bool query so its score is added to the score of the main query. The benefit of this query compared to the function_score query is that is compatible with the new way of retrieving top hits which makes it faster.

Adaptive replica selection

Adaptive replica selection was already introduced in version 6.1 as an experimental feature. With this feature, each node tracks and compares how long search requests to other nodes take, and uses this information to adjust how frequently to send requests to shards on particular nodes. This leads to nodes sending requests to the least busiest nodes and avoid the slowest nodes. In previous Elasticsearch versions this was done in round robin fashion, instead of using such information. Now in version 7.0 this feature is enabled by default.

Other improvements

Script Score Query

Designed to replace the Function Score Query, the Script Score Query is now introduced. This new type of query allows you to write scripts to compute a new score for each document returned by the query. This is useful when a score function is computationally expensive and you only want to compute the score on a filtered set of documents. Note that this query is still marked as experimental.

Interval queries

In some use-cases you would like to find records in which words or phrases are within a certain distance from each other. The only way to do this was to use span queries. This could be quite difficult, since span queries don’t use the analyzer. Hence the new intervals query. With the intervals query you get more control over the order and proximity of matching terms.

Feature-complete Java HLRC

The high-level REST client for Java is now marked as complete and now covers all APIs. Since I’m using this client a lot myself I’m glad to see this.