This blog isn't about ElasticSearch basics so if you are looking to know about ElasticSearch basics, Kindly refer the following: https://www.elastic.co/webinars/getting-started-elasticsearch?elektra=home&iesrc=ctr
As described, We'll see how to setup and make ElasticSearch up and running for production load. In this tutorial, I'll cover ElasticSearch version 2.3.4:
The whole process is divided into 2 parts:
1) Download and Install ElasticSearch.
2) ElasticSearch Configuration.
Although Before moving forward to set up an ElasticSearch Cluster in your production, You must have answers to the following questions.
q 1) Why do you need it? There are few scenarios where ElasticSearch can be a wrong fit. You must be sure of your use case and then only move forward.
q 2) How much data will it contain in next few months? This will obviously help you deciding about how much data nodes you should provision and disk vol capacity as well.
q 3) Will this cluster be having time-series data? If yes, You can plan for timely index rotation. One index shouldn't be too large as this causes performance related issues as well as scaling will be a challenge.
q 4) Will there be too many aggregation queries. If you're setting up a cluster for Analytics purposes, Obviously you should be considering giving fielddata_cache a good portion of heap size.
Part I - Download and Install ElasticSearch
Download ElasticSearch for Ubuntu:
wget https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-2.3.4.deb -P /tmp
dpkg -i /tmp/elasticsearch-2.3.4.deb
You can also install few plugins for monitoring your production ES Cluster, I personally like Paramedic, Marvel monitoring solutions:
Install Paramedic Plugin:
/usr/share/elasticsearch/bin/plugin install karmi/elasticsearch-paramedic/2.0
Install Marvel Agent:
/usr/share/elasticsearch/bin/plugin install marvel-agent
Part II - ElasticSearch Configuration
Setup ESHEAPSIZE, An ideal size for heap is 50% of available memory & no more than 32G on a single node.
In order to setup ESHEAPSIZE, Edit file /etc/default/elasticsearch with the following variable:
ES_HEAP_SIZE=10G # If you have 20G of available Memory MAX_LOCKED_MEMORY=unlimited # If you're using if you use the 'bootstrap.mlockall: true' option in elasticsearch.yml
# Cluster name cluster.name: es_cluster # path.data: /esdata # Network settings network.host: [_site_ , localhost] node.name: "node01" node.master: false # Set true for master node and False on Data, Client nodes. node.data: true # Set true for data node and false for master,client nodes. # run only 1 copy of ES in this machine node.max_local_storage_nodes: 1 # take all memory even if not using it bootstrap.mlockall: true # number of minumum master nodes discovery.zen.minimum_master_nodes: 2 # SEARCH - ThreadPool management for Searching threadpool.search.type: fixed threadpool.search.size: 12 threadpool.search.queue_size: 500 # INDEX - ThreadPool management for indexing threadpool.index.type: fixed threadpool.index.size: 2 threadpool.index.queue_size: 1000 # BULK - Threadpool management for Bulk Indexing threadpool.bulk.type: fixed threadpool.bulk.size: 4 threadpool.bulk.queue_size: 1000 # other nodes discovery.zen.ping.multicast.enabled: false # Disable multicast discovery. It's enabled by default and must be disabled in production. discovery.zen.ping.unicast.hosts: [IP1, IP2, IP3] # List of IPs of other nodes to connect with. # If Cluster is going to serve good amount of aggregation queries, Fielddata_cache size should be 30-40% of Heap indices.fielddata.cache.size: 35% # Filter cache is 10% by default. indices.cache.filter.size: 10% # The default is 20 MB/s, which is a good setting for spinning disks. If you have SSDs, you might consider increasing this to 100–200 MB/s indices.store.throttle.max_bytes_per_sec: 100mb # Disable Index Deletion action.disable_delete_all_indices: true # Circuit Breaker - This is very much important in order to make your cluster crashed caused by memory hungry queries. indices.breaker.fielddata.limit: 50% indices.breaker.total.limit: 60% cluster.routing.allocation.node_concurrent_recoveries: 4 marvel.agent.exporters: id1: type: http host: ["http://marvelhost1:9200", "http://marvelhost2:9200"]
After making these configurations and settings in place, You just left with starting ElasticSearch service on all of your nodes one by one. If you're an Ubuntu user, Just do:
service elasticsearch start
That's It ! After completing these steps, You'll be having an ElasticSearch Cluster up and running :)
Please leave your comments below if you've any suggestion or doubt. I'll be happy to help.