I don't know why exactly, but when it comes to high-performance no-SQL databases with clustering capabilities, many people seem to exclude Apache's Cassandra from their list of candidates. This might be a mistake. Cassandra has been around for quite a while now, so it is safe to say that it is a mature product fit for production. One good proof for this is the fact that Apple is running 25,000+ Cassandra nodes.

So, time to at least take a look. The following script expects to be run in a Linux environment with pre-installed Docker. It sets up a cluster with three nodes (granted, on the same machine, but hey, we're just getting started here).

# cassandra/run.sh
#
# setup script for a docker based cassandra cluster
#

# settings
CASSANDRA_VERSION="3.11.6"
NAME_TEMPLATE="cassandra_node"
NUMBER_OF_NODES=3
MEMORY_PER_CONTAINER="4g"
CLUSTER_NAME="plexx"
VOLUME_BASE="/opt/cassandra/node"
STARTUP_DELAY=60

# network: create own network if it doesn't exist already
CASSANDRA_NETWORK=`docker network ls --format="{{ .Name }}" |grep cassandra`
if [ -z "${CASSANDRA_NETWORK}" ]
then
	echo "cassandra network doesn't exist yet, so create it"
	docker network create cassandra
else
	echo "cassandra network already present"
fi

# stop and remove any existing containers (data will NOT be deleted)"
for node in $(seq $NUMBER_OF_NODES)
do
	CONTAINER_NAME="${NAME_TEMPLATE}${node}"
	echo "checking container $CONTAINER_NAME ..."
	NODE_IS_RUNNING=$(docker ps --format="{{ .Names }}" |grep "$CONTAINER_NAME")
	if [ -n "$NODE_IS_RUNNING" ]
	then
		echo "container ${CONTAINER_NAME} is running, stop it"
		docker stop "$CONTAINER_NAME"
	else
		echo "container $CONTAINER_NAME is not running"
	fi
	NODE_EXISTS=$(docker ps -a --format="{{ .Names }}" |grep "$CONTAINER_NAME")
	if [ -n "$NODE_EXISTS" ]
	then
		echo "remove pre-existing container $CONTAINER_NAME"
		docker rm "$CONTAINER_NAME"
	fi
done

# make sure host volumes exists
for node in $(seq $NUMBER_OF_NODES)
do
	MOUNT_POINT="${VOLUME_BASE}${node}"
	if [ -d "$MOUNT_POINT" ]
	then
		echo "mount point at $MOUNT_POINT exists"
	else
		echo "mount point at $MOUNT_POINT does not exist yet, so create it"
		sudo mkdir -p "$MOUNT_POINT"
	fi
done

# create and run the MASTER container
echo "create and start master"
docker run -d \
--name "${NAME_TEMPLATE}1" \
--network=cassandra \
--memory $MEMORY_PER_CONTAINER \
-e CASSANDRA_CLUSTER_NAME="$CLUSTER_NAME" \
-v "${VOLUME_BASE}1:/var/lib/cassandra" \
cassandra:"$CASSANDRA_VERSION"

# give the master some time to start up
echo "waiting $STARTUP_DELAY seconds..."
sleep "$STARTUP_DELAY"

# determine IP address of master
MASTER_IP="$(docker inspect --format='{{ .NetworkSettings.Networks.cassandra.IPAddress }}' ${NAME_TEMPLATE}1)"
echo "master node IP address is $MASTER_IP (will be used as seed)"

# all the other nodes 
for node in $(seq 2 $NUMBER_OF_NODES)
do
	echo "create and start node #$node"
	docker run -d \
	--name "${NAME_TEMPLATE}${node}" \
	--network=cassandra \
	--memory $MEMORY_PER_CONTAINER \
	-e CASSANDRA_CLUSTER_NAME="$CLUSTER_NAME" \
	-e CASSANDRA_SEEDS="$MASTER_IP" \
	-v "${VOLUME_BASE}${node}:/var/lib/cassandra" \
	cassandra:"$CASSANDRA_VERSION"
done

echo "done."

To verify that everything went well, issue the following command from your terminal command line: docker exec -it cassandra_node1 bash -c 'nodetool status'.  This should generate some output similar to this:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load     Tokens  Owns (effective)  Host ID                               Rack
UN  172.18.0.2  326.58 KiB  256  100.0%            668b272b9fea  rack1
UN  172.18.0.3  319.04 KiB  256  100.0%            1bb5e1c153b7  rack1

A status of "UN" stands for Up and Normal. This is what you want. Note: Cassandra needs memory. Anyhting below the 4g that I configured in my script most likely will make it refuse to start up. Swap is fine, though.

Finally, here are the ports used by / to be configured for Cassandra:

port number category description
7000 inter-node ports inter-node cluster communication
7001 inter-node ports SSL inter-node cluster communication
7199 inter-node ports JMX monitoring port
9042 client ports client connect port
9160 client ports client connect port (Thrift)
9142 client ports default for native_transport_port_ssl

I will follow up with a second post showing how to connect and talk to your new cluster using Go.