I don't know why exactly, but when it comes to high-performance no-SQL databases with clustering capabilities, many people seem to exclude Apache's Cassandra from their list of candidates. This might be a mistake. Cassandra has been around for quite a while now, so it is safe to say that it is a mature product fit for production. One good proof for this is the fact that Apple is running 25,000+ Cassandra nodes.
So, time to at least take a look. The following script expects to be run in a Linux environment with pre-installed Docker. It sets up a cluster with three nodes (granted, on the same machine, but hey, we're just getting started here).
# cassandra/run.sh
#
# setup script for a docker based cassandra cluster
#
# settings
CASSANDRA_VERSION="3.11.6"
NAME_TEMPLATE="cassandra_node"
NUMBER_OF_NODES=3
MEMORY_PER_CONTAINER="4g"
CLUSTER_NAME="plexx"
VOLUME_BASE="/opt/cassandra/node"
STARTUP_DELAY=60
# network: create own network if it doesn't exist already
CASSANDRA_NETWORK=`docker network ls --format="{{ .Name }}" |grep cassandra`
if [ -z "${CASSANDRA_NETWORK}" ]
then
echo "cassandra network doesn't exist yet, so create it"
docker network create cassandra
else
echo "cassandra network already present"
fi
# stop and remove any existing containers (data will NOT be deleted)"
for node in $(seq $NUMBER_OF_NODES)
do
CONTAINER_NAME="${NAME_TEMPLATE}${node}"
echo "checking container $CONTAINER_NAME ..."
NODE_IS_RUNNING=$(docker ps --format="{{ .Names }}" |grep "$CONTAINER_NAME")
if [ -n "$NODE_IS_RUNNING" ]
then
echo "container ${CONTAINER_NAME} is running, stop it"
docker stop "$CONTAINER_NAME"
else
echo "container $CONTAINER_NAME is not running"
fi
NODE_EXISTS=$(docker ps -a --format="{{ .Names }}" |grep "$CONTAINER_NAME")
if [ -n "$NODE_EXISTS" ]
then
echo "remove pre-existing container $CONTAINER_NAME"
docker rm "$CONTAINER_NAME"
fi
done
# make sure host volumes exists
for node in $(seq $NUMBER_OF_NODES)
do
MOUNT_POINT="${VOLUME_BASE}${node}"
if [ -d "$MOUNT_POINT" ]
then
echo "mount point at $MOUNT_POINT exists"
else
echo "mount point at $MOUNT_POINT does not exist yet, so create it"
sudo mkdir -p "$MOUNT_POINT"
fi
done
# create and run the MASTER container
echo "create and start master"
docker run -d \
--name "${NAME_TEMPLATE}1" \
--network=cassandra \
--memory $MEMORY_PER_CONTAINER \
-e CASSANDRA_CLUSTER_NAME="$CLUSTER_NAME" \
-v "${VOLUME_BASE}1:/var/lib/cassandra" \
cassandra:"$CASSANDRA_VERSION"
# give the master some time to start up
echo "waiting $STARTUP_DELAY seconds..."
sleep "$STARTUP_DELAY"
# determine IP address of master
MASTER_IP="$(docker inspect --format='{{ .NetworkSettings.Networks.cassandra.IPAddress }}' ${NAME_TEMPLATE}1)"
echo "master node IP address is $MASTER_IP (will be used as seed)"
# all the other nodes
for node in $(seq 2 $NUMBER_OF_NODES)
do
echo "create and start node #$node"
docker run -d \
--name "${NAME_TEMPLATE}${node}" \
--network=cassandra \
--memory $MEMORY_PER_CONTAINER \
-e CASSANDRA_CLUSTER_NAME="$CLUSTER_NAME" \
-e CASSANDRA_SEEDS="$MASTER_IP" \
-v "${VOLUME_BASE}${node}:/var/lib/cassandra" \
cassandra:"$CASSANDRA_VERSION"
done
echo "done."
To verify that everything went well, issue the following command from your terminal command line: docker exec -it cassandra_node1 bash -c 'nodetool status'
. This should generate some output similar to this:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.2 326.58 KiB 256 100.0% 668b272b9fea rack1
UN 172.18.0.3 319.04 KiB 256 100.0% 1bb5e1c153b7 rack1
A status of "UN" stands for Up and Normal. This is what you want. Note: Cassandra needs memory. Anyhting below the 4g that I configured in my script most likely will make it refuse to start up. Swap is fine, though.
Finally, here are the ports used by / to be configured for Cassandra:
port number | category | description |
---|---|---|
7000 | inter-node ports | inter-node cluster communication |
7001 | inter-node ports | SSL inter-node cluster communication |
7199 | inter-node ports | JMX monitoring port |
9042 | client ports | client connect port |
9160 | client ports | client connect port (Thrift) |
9142 | client ports | default for native_transport_port_ssl |
I will follow up with a second post showing how to connect and talk to your new cluster using Go.