Magic Collaboration Studio uses Apache Cassandra, an open-source NoSQL distributed database. Before installing Magic Collaboration Studio , please follow the steps below to install Apache Cassandra.

Prerequisites

Installing with script

The script downloads and installs the necessary packages, Cassandra, and the Cassandra tools from the Apache Software Foundation repository, and creates the necessary firewall rules to allow proper operation both for a single node or a cluster installation. The script will also install the required Java version and set it as the default system Java. 

The script can also be used for offline installation, download the prerequisite RPM packages and place them in the same location as the installation script. Manually install required Java version and set it as the system default Java.

To install Apache Cassandra


  1. Install Apache Cassandra by executing the install_cassandra5x_ol_rhel.sh installation script.

    Example

    sudo ./install_cassandra5x_ol_rhel.sh
    CODE
  2. Start Apache Cassandra by executing the following command:

    sudo systemctl start cassandra
    BASH
  3. Check if Apache Cassandra is running by executing the following command:

    nodetool status
    BASH

    If Apache Cassandra is running, you should receive the output displayed below. If the service is fully operational, the first 2 characters of the last line are "UN", indicating that the node status is Up, and its state is Normal.

    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load       Tokens   Owns (effective)  Host ID                               Rack
    UN  127.0.0.1  128.4 KB   256      100.0%            ea3f99eb-c4ad-4d13-95a1-80aec71b750f  rack1
    BASH

    Wait for a few minutes until Cassandra starts for the first time before checking if it is running. If Cassandra has not started yet, you will get the error: "No nodes present in the cluster. Has this node finished starting up?" This means that you need to give Cassandra more time to start.

  4. If Apache Cassandra is not running or if you used installation options other than the one described in this chapter, optionally configure Apache Cassandra.

Developing a backup strategy

Before deploying Magic Collaboration Studio and Apache Cassandra in a production environment, it is imperative to have a fully implemented backup strategy. The Cassandra database stores all project and user data associated with Magic Collaboration Studio . Review the backup and restore data procedure document. Ensure that you test the entire backup and restore process before your deployment goes live to users.

During the backup process, user access to Cassandra should be suspended. Refer to the Cassandra backup documentation for more information. 

Improper backup procedure can lead to total data loss! For example, taking an image snapshot of the storage system while Cassandra is actively accepting read and write requests will result in unrecoverable data.


Configuring Apache Cassandra for Magic Collaboration Studio

If you used other installation options and not the provided script or if Apache Cassandra does not start, configure it as described below.

Before starting, note that you do not need to configure Apache Cassandra if you installed it using the installation script we provided (install_cassandra<version_number>_<os_version>.sh). It should start without any additional configuration.

To configure Apache Cassandra


  1. Locate the cassandra.yaml file (default: /etc/cassandra/conf/cassandra.yaml) and open for editing.

  2. Find the following parameters related to the Cassandra node IP address and communication settings, and change them as shown below:

    Example

    seeds: "192.168.130.10"
    listen_address: 192.168.130.10
    broadcast_rpc_address: 192.168.130.10
    rpc_address: 0.0.0.0
    BASH
    • seeds - a comma-delimited list containing all of the seeds in the Cassandra cluster. Since our cluster consists of a single node, it contains only one entry - our IP address.
    • listen_address - the IP address that Cassandra uses to listen for connections.
    • broadcast_rpc_address - the IP address used to broadcast to other Cassandra nodes in the cluster. This parameter may be commented. In such a case, remove the "#" and ensure there are no leading spaces.
    • rpc_address - when set to 0.0.0.0, Cassandra listens to rpc requests on all interfaces.
  3. Find the following parameters that control thresholds to ensure that the data being sent is processed properly, and change them as shown below:

    Example

    commitlog_segment_size: 192MiB
    read_request_timeout: 1800000ms
    range_request_timeout: 1800000ms
    write_request_timeout: 1800000ms
    cas_contention_timeout: 1000ms
    truncate_request_timeout: 1800000ms
    request_timeout: 1800000ms
    batch_size_warn_threshold: 3000KiB
    batch_size_fail_threshold: 5000KiB
    BASH
  4. To ensure that the default commit log size is 8GB (recommended), uncomment the commitlog_total_space parameter as shown below.

    Example

    commitlog_total_space: 8192MiB
    BASH

    Ensure that the partition where the commit log is installed has enough space to accommodate a commit log of 8GB.

  5. To point the data to the appropriate locations, find the following parameters and change them as shown below:

    Example

    data_file_directories:
    - /data/data
    commitlog_directory: /logs/commitlog
    hints_directory: /data/hints
    saved_caches_directory: /data/saved_caches
    BASH
  6. Start Apache Cassandra by executing the following command:

    sudo systemctl start cassandra
    BASH
  7. Check if Apache Cassandra is running by executing the following command:

    nodetool status
    BASH

    If Apache Cassandra is running, you should receive the output displayed below. If the service is fully operational, the first 2 characters of the last line are "UN", indicating that the node status is Up, and its state is Normal.

    Example

    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address    Load       Tokens   Owns (effective)  Host ID                               Rack
    UN  127.0.0.1  128.4 KB   256      100.0%            ea3f99eb-c4ad-4d13-95a1-80aec71b750f  rack1
    BASH

Configuring Cassandra memory usage

If you did not use the installation script or want to increase the RAM usage by Cassandra, make the following changes. Otherwise, these configuration changes are set automatically by the Cassandra installation script.

Set the maximum direct memory size to ensure your off-heap structures (like memtables and caches) have a dedicated, stable pool of memory and prevent the node from crashing due to memory limits.

To configure the maximum direct memory size, you must explicitly set the standard JVM argument -XX:MaxDirectMemorySize, in the appropriate JVM options file jvmXX-server.options.

  • Default file location: /etc/cassandra/conf/jvmXX-server.options
  • You may need to add the -XX:MaxDirectMemorySize line if it is missing, or edit the existing one.
  • Replace jvmXX with your installed Java version (read requirements for the deployed version).

Edit jvmXX-server.options and add the line:

-XX:MaxDirectMemorySize=10G
CODE

To configure the Java Heap space for Cassandra, please see Configuring Java heap space.

Configuring Cassandra CPU usage

  • In the jvmXX-server.options file, uncomment the following lines and set the values to the physical CPU core count (the values of both parameters should be the same):
-XX:ParallelGCThreads=16
-XX:ConcGCThreads=16
CODE

Additional configuration

  • Synchronize CPU clocks on all Cassandra cluster nodes. Otherwise, you may encounter issues when creating an empty Cassandra cluster.
  • When using cqlsh, the Python version required for cqlsh to work with Cassandra 5.0 is Python 3.8 to 3.11.
  • In the logback.xml file, comment the "<appender-ref ref="ASYNCDEBUGLOG" />" line. This will increase Cassandra's performance by disabling the debug log.

Configuring Linux environment for Cassandra performance

If you install Magic Collaboration Studio using the install_twc_mcs_centos_rhel.sh script, Cassandra performance is tuned automatically. However, if you plan to use other installation options or if you need to set other parameters after running the script, you can do it manually as described in this section.


To improve Apache Cassandra performance


  1. Locate the sysctl.conf file (default: /etc/sysctl.conf) and open it for editing.

  2. To configure the TCP settings, add the following tuning parameters to the file:

    Example

    net.core.rmem_max=16777216
    net.core.wmem_max=16777216
    net.core.optmem_max=40960
    net.core.default_qdisc=fq
    net.core.somaxconn=4096
    net.ipv4.conf.all.arp_notify = 1
    net.ipv4.tcp_keepalive_time=60
    net.ipv4.tcp_keepalive_probes=3
    net.ipv4.tcp_keepalive_intvl=10
    net.ipv4.tcp_mtu_probing=1
    net.ipv4.tcp_rmem=4096 12582912 16777216 
    net.ipv4.tcp_wmem=4096 12582912 16777216 
    net.ipv4.tcp_max_syn_backlog=8096
    net.ipv4.tcp_slow_start_after_idle = 0
    net.ipv4.tcp_tw_reuse = 1 
    vm.max_map_count = 1048575
    vm.swappiness = 0
    vm.dirty_background_ratio=5
    vm.dirty_ratio=80
    vm.dirty_expire_centisecs = 12000
    BASH
  3. To apply the setting without rebooting, execute the following command:

    sudo sysctl -p
    BASH

For more information about tuning Linux, see DSE 6.8 Administrator Guide.

Using jemalloc memory allocator

The jemalloc memory allocator package can potentially improve Cassandra performance. Our installation script does not install jemalloc. The easiest way to install this optional package is to first install the epel-release package. You can then pull the latest jemalloc release from the EPEL repository. An older version jemalloc is available for direct download and installation.

  • EPEL package (optional, for pulling jemalloc package)
  • jemalloc package (optional for performance)