Starting Cassandra - Setting up a Multi-DC environment

This is a quick and dirty opinionated guide to setting up a Cassandra cluster with multiple data centers.

A new cluster

An existing cluster

This is harder and involves more work and more options, but I’m going to discuss the way that gets you into the least amount of trouble operationally.

enjoy new data center

how to get data to new dc

Repair approach

Best done with if your repair jobs can’t be missed or stopped, either because you have a process like OpsCenter or Reaper running repairs. It also has the advantage of being very easy, and if you’ve already automated repair you’re basically done.

Rebuild approach

Faster less resource intensive, and if you have enough time to complete it while repair is stopped. Rebuild is easier to ‘resume’ than repair in many ways, so this has a number of advantages.

YOLO rebuild with repair

This will probably over stream it’s share of data and honestly a lot of folks do this for some reason in practice:

Cloud strategies

There are a few valid approaches to this and none of them are wrong IMO.

region == DC, rack == AZ

Will need to get into racks and a lot of people get this wrong and imbalance the racks, but you get the advantage of more intelligent failure modes, with racks mapping to AZs.

AZ..regardless of region == DC

This allows things to be balanced easily, but you have no good option for racks then. However, some people think racks are overrated, and I’d say a majority of clusters run with one rack.