Let's setup Cassandra 3.0 - Installation & Production Configuration

A post from dbpandit

If You're new to Apache Cassandra and curious to know more about it, This blog is for you. I'll cover some basics about Cassandra, I'll go throw its installation process considering the latest stable release. Also, I'll share important parameters and tips about how to configure Cassandra in the Production environment.

Ok, So what is Cassandra and why one should even consider it?
Cassandra or Apache Cassandra is free & open-source DBMS which is a highly scalable, high-performance distributed database designed to handle large amounts of data across many commodity servers.

If we compare Cassandra with RDBMS, It comes out with the following differences:
a) Handles high incoming data velocity.
b) Data arriving from many locations.
c) It manages all types of data.
d) Supports simple transactions.
e) No single points of failure; constant uptime.
f) Supports very high data volumes.
g) Decentralized deployments.
h) Supports read and write scalability &
i) Deployed in horizontal scale-out fashion.
for more, Please refer: https://academy.datastax.com/resources/brief-introduction-apache-cassandra

Now, let's see how can we setup 3 node Cassandra Cluster in AWS - Ubuntu 14.04 - m3.medium instances, OS and instance factors can change installation and configuration process a little bit, however, It remains same for most of the following mentioned steps.

First of all install Java8, As it is required for Cassandra 3.0 and above versions.

Add Java8 repository

sudo add-apt-repository ppa:webupd8team/java  

update apt-get

apt-get update  

Automatically select Oracle's Licence

echo debconf shared/accepted-oracle-license-v1-1 select true | sudo debconf-set-selections  

Install Java8

apt-get install oracle-java8-installer  
apt-get install oracle-java8-set-default  

Now, Let's Install Cassandra 3.0:

Add Cassandra's repo:

echo "deb http://www.apache.org/dist/cassandra/debian 30x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list  
curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add -  
apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA  

Update apt-get

apt-get update  

Install Apache Cassandra

apt-get install cassandra  

After the above step, Cassandra will automatically start. Stop its service and remove its data and start it back:

service cassandra stop  
rm -rf /var/lib/cassandra/data/system/*  
service cassandra start  

Check out Cassandra's release version by connecting it via cqlsh as mentioned below:

Configure Cassandra for Production

Stop Cassandra's service and replace its configuration file i.e. /etc/cassandra/cassandra.yaml with the following material, There might be some settings mentioned below which you would like to change as per your needs and choices.

## Cassandra Configuration ##
---
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator  
authorizer: org.apache.cassandra.auth.AllowAllAuthorizer  
auto_bootstrap: true  
auto_snapshot: false  
batch_size_warn_threshold_in_kb: 5  
batchlog_replay_throttle_in_kb: 1024  
broadcast_address: localhost_ip  
broadcast_rpc_address: localhost_ip  
cas_contention_timeout_in_ms: 1000  
client_encryption_options:  
  enabled: false
  keystore: conf/.keystore
  keystore_password: cassandra
  require_client_auth: false
cluster_name: prod-cassandra  
column_index_size_in_kb: 64  
commit_failure_policy: stop  
commitlog_directory: "/var/lib/cassandra/commitlog"  
# commitlog_segment size :
commitlog_segment_size_in_mb: 64  
commitlog_sync: periodic  
commitlog_sync_period_in_ms: 10000  
commitlog_total_space_in_mb: 8192  
compaction_throughput_mb_per_sec: 16  
concurrent_compactors: 4  
concurrent_counter_writes: 32  
concurrent_reads: 16  
concurrent_writes: 8  
counter_cache_save_period: 7200  
counter_write_request_timeout_in_ms: 5000  
cross_node_timeout: false  
data_file_directories:  
- "/var/lib/cassandra/data"
disk_failure_policy: stop_paranoid  
dynamic_snitch_badness_threshold: 0.1  
dynamic_snitch_reset_interval_in_ms: 600000  
dynamic_snitch_update_interval_in_ms: 100  
endpoint_snitch: GossipingPropertyFileSnitch  
hinted_handoff_enabled: true  
hinted_handoff_throttle_in_kb: 1024  
hints_directory: "/var/lib/cassandra/hints"  
incremental_backups: false  
index_interval: 128  
index_summary_resize_interval_in_minutes: 60  
inter_dc_tcp_nodelay: true  
internode_compression: none  
key_cache_save_period: 14400  
listen_address: localhost_ip  
max_hint_window_in_ms: 10800000  
max_hints_delivery_threads: 2  
memtable_allocation_type: heap_buffers  
native_transport_port: '9042'  
num_tokens: 256  
partitioner: org.apache.cassandra.dht.Murmur3Partitioner  
permissions_validity_in_ms: 2000  
phi_convict_threshold: 8  
range_request_timeout_in_ms: 10000  
read_request_timeout_in_ms: 5000  
request_scheduler: org.apache.cassandra.scheduler.NoScheduler  
request_timeout_in_ms: 10000  
row_cache_save_period: 0  
row_cache_size_in_mb: 2048  
rpc_address: 0.0.0.0  
rpc_keepalive: 'true'  
rpc_max_threads: 2048  
rpc_min_threads: 16  
rpc_port: '9160'  
rpc_server_type: sync  
saved_caches_directory: "/var/lib/cassandra/saved_caches"  
seed_provider:  
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: localhost_ip, secondnode_ip, thirdnode_ip
server_encryption_options:  
  internode_encryption: none
  keystore: conf/.keystore
  keystore_password: cassandra
  truststore: conf/.truststore
  truststore_password: cassandra
snapshot_before_compaction: false  
ssl_storage_port: 7001  
sstable_preemptive_open_interval_in_mb: 50  
start_native_transport: true  
start_rpc: 'true'  
storage_port: 7000  
stream_throughput_outbound_megabits_per_sec: 400  
streaming_socket_timeout_in_ms: 0  
thrift_framed_transport_size_in_mb: 15  
thrift_max_message_length_in_mb: 16  
tombstone_failure_threshold: 100000  
tombstone_warn_threshold: 1000  
trickle_fsync: false  
trickle_fsync_interval_in_kb: 10240  
truncate_request_timeout_in_ms: 60000  
write_request_timeout_in_ms: 20000  

Please refer the following link for getting more details about any of the above-mentioned parameter: http://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html

After updating Cassandra's configuration, Consider restarting Cassandra service and if you're on AWS, Please make sure to allow port-range 7000 - 7001 (SSL) for inter-cluster communications. Also following ports in order to communicate cassandra from the application.

In total, Allow following ports:

7000 for cluster communication  
7001 for SSL enabled cluster communication  
9042 for native protocol clients  
7199 for JMX  
9160 for thrift interface  

Finally, Check out if all 3 nodes are well connected, UP and Running with nodetool utility i.e. an import tool which comes along with Cassandra:

In the above snapshot, UN means UP and Normal. You can also the setup monitoring tool on Cassandra for monitoring & administration purposes like Datastax's OpsCenter or DataDog.

If you face any doubt or have any suggestion in mind, Please comment in the below section. I'll be happy to reply back.

Thank You!