Deploying Cassandra Multi-Data Center

On this page

Teamwork Cloud 2022x Refresh2 and newer versions support Apache Cassandra multiple data centers (Multi-DC) capability. Multi-DC Cassandra ensures continuous availability of data to Teamwork Cloud users. There are two main scenarios for using multiple data centers: live backup/disaster recovery and geographical location. The deployment configuration of a live backup data center is presented here as an example.

Please note that Multi-DC deployment requires in-depth knowledge of Cassandra architecture and administration. Incorrect deployment or operation can result in total data loss. Technical support is not available, as this is considered a custom deployment strategy.

Scenarios for deploying a Multi-DC Cassandra cluster

Live backup/disaster recovery

Multiple data centers can be deployed at different physical locations as a disaster recovery solution. In this situation, one Cassandra data center would be the primary access point, while an off-site secondary data center would continuously receive the latest database updates. If the primary data center goes down, data can be restored from the secondary data center by bringing new Cassandra nodes online at the primary data center. Teamwork Cloud can also be reconfigured to connect to the secondary data center as a temporary solution.

In this scenario, there is no degradation in performance. Read and write operations are performed on the primary data center first, then eventually synced to the second data center as a background operation.

Geographical location

For organizations with users spread across multiple geographical locations, it may be desirable to optimize data access by location. Each region can have a dedicated data center. All of the data centers would form one Cassandra cluster, and all of them would have access to the latest data.

In this scenario, performance would mainly be affected by the database write operations. Write operations would be applied to all data centers. Read operations would remain local and thus not affect performance.

Deploying Multi-DC Cassandra for Teamwork Cloud

This is the recommended configuration procedure for deploying Multi-DC Cassandra for Teamwork Cloud.

To configure Multi-DC Cassandra cluster

Configure Cassandra cluster with more than one data center.
Start Cassandra cluster and verify all data centers are online.
Configure Teamwork Cloud for Multi-DC.
Configure authentication in Web Application Platform for Multi-DC.
For an existing database, update the “twc” keyspace for Multi-DC setup.
Start Teamwork Cloud services and verify operation.

Configuring Cassandra cluster with multiple data centers

Our example deployment is a single Cassandra cluster with two data centers located in separate geographical locations. Each Cassandra data center has three nodes. When performing the initial installation and configuration of Cassandra nodes, make sure the version of Cassandra is consistent across all nodes. Also, make sure to synchronize clocks across all nodes.

Start by updating the cassandra.yaml configuration file.

Set a unique cluster name for all nodes. This deployment is a single cluster spanning multiple data centers. The cluster name should be the same across all nodes.

cluster_name: 'MDC Cluster'

Set the seed provider for all data centers. Assign at least two seed nodes for each data center. Specify the address for all seeds. Alternatively, use a seed provider service to keep track of seed node addresses.

seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "10.0.0.1, 10.0.0.2, 10.0.1.1, 10.0.1.2"

Then, change the endpoint snitch implementation on each node. The default snitch is SimpleSnitch, which does not support Multi-DC. Cassandra uses snitch to gather network topology information. In this case, we are using GossipingPropertyFileSnitch for Multi-DC setup. Cassandra will obtain data center information from the cassandra-rackdc.properties file when GossipingPropertyFileSnitch is set.

endpoint_snitch: GossipingPropertyFileSnitch

Refer to Cassandra documentation if you are updating snitch for an exiting database: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsSwitchSnitch.html

Edit the cassandra-rackdc.properties file for each node. All nodes in a data center should have the same “dc” name.

For the primary data center:

dc=datacenter1
rack=rack1

For the backup data center:

dc=datacenter2
rack=rack1

Start each Cassandra node. Port 7000 is the default inter-node communication (encrypted and unencrypted) port for a Cassandra cluster. Make sure port 7000 is open for communication between data centers. Please note that the initial connection between data centers can take some time. Run nodetool status to confirm that all data centers have connected to the cluster.

For an existing Cassandra cluster, run nodetool repair and nodetool cleanup after the above configurations have been made.

Configuring Teamwork Cloud for Multi-DC

Once all Cassandra nodes are online and data center connections are confirmed, configure the Teamwork Cloud application.conf file for Multi-DC.

In the esi.persistence.cassandra.keyspace section, we have added the following parameters for our example. With 3 nodes in each data center, the replication factor is set to 3. The consistency level is set to “FAST_RELIABILITY_GLOBAL" for a disaster recovery scenario. Please refer to the table below for consistency setting options. If deploying an entirely new database, the “twc” keyspace will be created with the replication strategy and factors specified here. For an existing database, you will have to manually edit the keyspace replication parameters (see Cassandra Keyspace Configuration below).

is-multi-dc = true
strategy-class = "NetworkTopologyStrategy"
consistency = "FAST_RELIABILITY_GLOBAL"
dc-replication-factors {
     datacenter1=3
     datacenter2=3
}

When is-multi-dc is set to true, the replication-factor parameter is ignored.
The dc-replication-factors configuration will only be applied when creating a new “twc” keyspace. If there is an existing “twc” keyspace in Cassandra, Teamwork Cloud will use the existing replication factor set in the database. Use ALTER KEYSPACE to change the replication factor.

Keyspace Consistency Setting	Write Consistency Level	Read Consistency Level
BEST_READ_PERFORMANCE_GLOBAL	ALL	ONE
BEST_READ_PERFORMANCE_LOCAL	ALL	LOCAL_ONE
BEST_RELIABILITY_GLOBAL	EACH_QUORUM	EACH_QUORUM
FAST_RELIABILITY_GLOBAL	EACH_QUORUM	LOCAL_QUORUM
BEST_RELIABILITY_LOCAL	LOCAL_QUORUM	LOCAL_QUORUM

In the esi.persistence.datastax-java-driver.basic section, set the data center name and contact points to local values. The local data center will be used as the primary database for client requests.

load-balancing-policy.local-datacenter = datacenter1

Use the seed nodes from the local data center specified in cassandra.yaml

contact-points = ["10.0.0.1:9042","10.0.0.2:9042"]

Configuring Cassandra keyspace

If you are deploying Multi-DC with an existing Cassandra database, you will need to manually reconfigure the replication parameters for the “twc” keyspace.

Use cqlsh, the Cassandra command-line interface, to modify “twc” keyspace replication parameters.

ALTER KEYSPACE twc WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2' : 3};

Run nodetool repair after keyspace modification to update all nodes.

To verify replication in cqlsh:

select * from system_schema.keyspaces;

These steps can also be used to create the “twc” keyspace. Note that manual keyspace creation is not needed, as Teamwork Cloud will create the keyspace during initial startup.

CREATE KEYSPACE twc WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1' : 3, 'datacenter2' : 3}

Multi-DC Cassandra Best Practices

The Cassandra version must be the same across all nodes and data centers
At least two seed nodes should be assigned for each data center
Clocks should be synchronized for all node

Page tree