Distribute and scale using CockroachDB

A robust database engine is the foundation of many applications, no matter where they are running. There is a vast variety of options you have to choose from. But when it comes to horizontal scaling and resilience, things quickly get complicated. Clustered databases with read and write access typically are no low-hanging fruits and often are reserved for the enterprise versions of commercial database products. Well, I have good news for you: there is a company named Cockroach Labs that is offering a database claiming to do exactly this (i.e. scale and distribute). And while there are commercial plans to run a managed cluster on AWS and/or Google, the self-hosted variant is free to use, event for commercial purposes. CockroachDB comes with the BSL (Business Source License), a model that is mostly generous unless you are planning to offer the product in a database-as-a-service mode. This is excluded, and the reason is that the company wants to protect themselves from being taken over by the big guys (Azure, AWS, Google) and losing this type of business to them. This has happened before with other popular open source database products.

While this obviously doesn't make a lot of sense for a production environment, you can easily spin up a CockroachDB cluster on one machine using Docker. This is what I will show you in this post.

Prerequisites

I won't go over the details of installing docker and docker-compose. There are many good tutorials out there on how to do this. So, I am assuming that you have a test machine with some flavour of Linux and docker-compose installed.

Starting the Cluster

Here's the docker-compose.yml file I used for my little test:

version: "3"
services:
  cockroach1:
    image: cockroachdb/cockroach
    command: start --insecure
    ports:
      - "8080:8080"
    volumes:
      - ./data/cockroach1:/cockroach/cockroach-data
    networks:
      cockroachnetwork:
        aliases:
          - cockroach1

  cockroach2:
    image: cockroachdb/cockroach
    command: start --insecure --join=cockroach1
    volumes:
      - ./data/cockroach2:/cockroach/cockroach-data
    depends_on:
      - cockroach1
    networks:
      cockroachnetwork:
        aliases:
          - cockroach2

  cockroach3:
    image: cockroachdb/cockroach
    command: start --insecure --join=cockroach1
    volumes:
      - ./data/cockroach3:/cockroach/cockroach-data
    depends_on:
      - cockroach1
    networks:
      cockroachnetwork:
        aliases:
          - cockroach3

networks:
  cockroachnetwork:
    driver: bridge

Fire it up by running docker-compose up, and there you have your clustered database. You can now attach to any of your three nodes and execute SQL statements. The command docker-compose exec cockroach1 ./cockroach sql --insecure will take you to a prompt where you can enter and run standard SQL commands. You can connect to any of the nodes, and your modifications will be replicated to all others.

Nice Specialties

I am relatively new to CockroachDB, too. So, I don't know how exactly they are doing it, but there are some specialties in this product that come in very handy.

No proprietary Drivers

CockroachDB does not come with its own set of drivers. Instead, its makers decided to establish compatibility with PostgreSQL. So, in your projects, you can just pretend to be using a Postgres database while in reality using CockroachDB.

No Master

There is no real master. All nodes are equally important...or replaceable. Try stopping one of the nodes (e.g. via docker stop cockroach1), make some changes and restart the node. You will find all modifications to be replicated in the resurrected node, too.

I haven't done extensive error scenario testing yet, but I certainly will and will keep you posted.

Markus Schneider

k8s Service Discovery

Minikube Tweaks

Docker, iptables and ufw

Sidecar Pattern