Setup and Configure ElasticSearch Cluster for Production/Test Environment

A post from dbpandit

This blog isn't about ElasticSearch basics so if you are looking to know about ElasticSearch basics, Kindly refer the following: https://www.elastic.co/webinars/getting-started-elasticsearch?elektra=home&iesrc=ctr

As described, We'll see how to setup and make ElasticSearch up and running for production load. In this tutorial, I'll cover ElasticSearch version 2.3.4:
The whole process is divided into 2 parts:
1) Download and Install ElasticSearch.
2) ElasticSearch Configuration.

Although Before moving forward to set up an ElasticSearch Cluster in your production, You must have answers to the following questions.
q 1) Why do you need it? There are few scenarios where ElasticSearch can be a wrong fit. You must be sure of your use case and then only move forward.
q 2) How much data will it contain in next few months? This will obviously help you deciding about how much data nodes you should provision and disk vol capacity as well.
q 3) Will this cluster be having time-series data? If yes, You can plan for timely index rotation. One index shouldn't be too large as this causes performance related issues as well as scaling will be a challenge.
q 4) Will there be too many aggregation queries. If you're setting up a cluster for Analytics purposes, Obviously you should be considering giving fielddata_cache a good portion of heap size.

Part I - Download and Install ElasticSearch

Download ElasticSearch for Ubuntu:

wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.3.4.deb -P /tmp  

Install ElasticSearch:

dpkg -i /tmp/elasticsearch-2.3.4.deb  

You can also install few plugins for monitoring your production ES Cluster, I personally like Paramedic, Marvel monitoring solutions:
Install Paramedic Plugin:

/usr/share/elasticsearch/bin/plugin install karmi/elasticsearch-paramedic/2.0

Install Marvel Agent:

/usr/share/elasticsearch/bin/plugin install marvel-agent

Part II - ElasticSearch Configuration

Setup ESHEAPSIZE, An ideal size for heap is 50% of available memory & no more than 32G on a single node.

In order to setup ESHEAPSIZE, Edit file /etc/default/elasticsearch with the following variable:

ES_HEAP_SIZE=10G # If you have 20G of available Memory  
MAX_LOCKED_MEMORY=unlimited # If you're using if you use the 'bootstrap.mlockall: true' option in elasticsearch.yml  

ElasticSearch Configuration:

# Cluster name
cluster.name: es_cluster #  
path.data: /esdata

# Network settings
network.host: [_site_ , localhost]

node.name:  "node01"  
node.master: false # Set true for master node and False on Data, Client nodes.  
node.data: true # Set true for data node and false for master,client nodes.

# run only 1 copy of ES in this machine
node.max_local_storage_nodes: 1

# take all memory even if not using it
bootstrap.mlockall: true

# number of minumum master nodes
discovery.zen.minimum_master_nodes: 2

# SEARCH - ThreadPool management for Searching
threadpool.search.type: fixed  
threadpool.search.size: 12  
threadpool.search.queue_size: 500

# INDEX - ThreadPool management for indexing
threadpool.index.type: fixed  
threadpool.index.size: 2  
threadpool.index.queue_size: 1000


# BULK - Threadpool management for Bulk Indexing
threadpool.bulk.type: fixed  
threadpool.bulk.size: 4  
threadpool.bulk.queue_size: 1000

# other nodes
discovery.zen.ping.multicast.enabled: false # Disable multicast discovery. It's enabled by default and must be disabled in production.  
discovery.zen.ping.unicast.hosts: [IP1, IP2, IP3] # List of IPs of other nodes to connect with.

# If Cluster is going to serve good amount of aggregation queries, Fielddata_cache size should be 30-40% of Heap
indices.fielddata.cache.size: 35%  
# Filter cache is 10% by default.
indices.cache.filter.size: 10%  
# The default is 20 MB/s, which is a good setting for spinning disks. If you have SSDs, you might consider increasing this to 100–200 MB/s
indices.store.throttle.max_bytes_per_sec: 100mb

# Disable Index Deletion
action.disable_delete_all_indices: true

# Circuit Breaker - This is very much important in order to make your cluster crashed caused by memory hungry queries.
indices.breaker.fielddata.limit: 50%  
indices.breaker.total.limit: 60%

cluster.routing.allocation.node_concurrent_recoveries: 4

marvel.agent.exporters:  
  id1:
    type: http
    host: ["http://marvelhost1:9200", "http://marvelhost2:9200"]

After making these configurations and settings in place, You just left with starting ElasticSearch service on all of your nodes one by one. If you're an Ubuntu user, Just do:

service elasticsearch start  

That's It ! After completing these steps, You'll be having an ElasticSearch Cluster up and running :)

Please leave your comments below if you've any suggestion or doubt. I'll be happy to help.