Overview

This Quick Start reference deployment guide provides step-by-step instructions for deploying Apache Cassandra™ 4.0-beta1 on the Amazon Web Services (AWS) Cloud.

As Apache Cassandra™ adoption grows within your organization, so could the challenges involved with using, maintaining, and supporting the technology. This can add considerable cost, complexity, and administrative burden. Apache Cassandra addresses these challenges by streamlining operations and controlling costs for all your Cassandra workloads.

You’ll have access to best-in-class, 100% open source Apache Cassandra software, as well as optional support and services from the experts that authored the majority of the Cassandra code.

Quick Starts are automated reference deployments that use AWS CloudFormation templates to launch, configure, and run the AWS compute, network, storage, and other services required to deploy a specific workload on AWS.

This Quick Start is for users who need an easily deployed Apache Cassandra cluster for development and or testing purposes

Costs and Licenses

You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start.

The AWS CloudFormation template for this Quick Start includes configuration parameters that you can customize. Some of these settings, such as instance type, will affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you will be using.

This Quick Start includes Apache Cassandra 4.0.

Architecture

Deploying this Quick Start for a new virtual private cloud (VPC) with default parameters builds the following environment in the AWS Cloud.

Figure 1

Figure 1: Quick Start architecture for Apache Cassandra on AWS

The Quick Start sets up the following components. (The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks.)

  • A highly available architecture that spans three Availability Zones. *

  • A VPC configured with public and private subnets according to AWS best practices, to provide you with your own virtual network on AWS. For more information about the VPC infrastructure, see the Amazon VPC Quick Start. *

  • An Internet gateway to allow access to the Internet. *

  • Managed NAT gateways to allow outbound Internet access for resources in the private subnets. *

  • One EC2 instance in the public subnet running devOps apps, which can also be used as a jumpbox to ssh into Apache Cassandra nodes in the private subnet.

  • Additional EC2 instances for Apache Cassandra nodes, depending on your Quick Start parameter settings. (By default, the Quick Start creates a 3 node Apache Cassandra cluster with each node in a private subnet).

  • One Amazon Elastic Block Store (Amazon EBS) data volume per node instance deployed.

Apache Cassandra Data Centers and Nodes

Apache Cassandra data centers are groups of nodes, related and configured within a cluster for replication purposes. The Quick Start creates 1 data center with 1 - 32 nodes.

Apache Cassandra stores data replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The replication strategy is defined per keyspace, and is set during keyspace creation. This Quick Start places nodes in each AWS Availability Zone.

Prerequisites

Specialized Knowledge

Before you deploy this Quick Start, we recommend that you become familiar with the following AWS services. (If you are new to AWS, see Getting Started with AWS.)

We also recommend that you become familiar with the features and configuration of Apache Cassandra.

Technical Requirements

This Quick Start uses a Linux AMI (Ubuntu 18.04 LTS) for EC2 instances, and creates EBS volumes and an S3 bucket. The account you run this Quick Start in must have authorization to create these resources.

Deployment Options

This Quick Start provides two deployment options:

  • Deploy Apache Cassandra into a new VPC (end-to-end deployment). This option builds a new AWS environment consisting of the VPC, subnets, NAT gateways, security groups, and other infrastructure components, and then deploys Apache Cassandra cluster into this new VPC.

  • Deploy Apache Cassandra into an existing VPC. This option provisions Apache Cassandra cluster in your existing AWS VPC infrastructure.

The Quick Start provides separate templates for these options. It also lets you configure CIDR blocks, instance types, and Apache Cassandra settings, as discussed later in this guide.

Deployment Steps

Step 1. Prepare An AWS Account

If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad. Use the region selector in the navigation bar to choose the AWS Region where you want to deploy the Quick Start on AWS.

choosing region

Choosing an AWS Region

Consider choosing a region closest to your data center or corporate network to reduce network latency between systems running on AWS and the systems and users on your corporate network.

Also, note that your choice of region will determine whether the Quick Start deploys NAT gateways or NAT instances for network connections. For a list of regions that support NAT gateways, see Amazon VPC pricing.

Create a key pair in your preferred region. To do this, in the navigation pane of the Amazon EC2 console, choose Key Pairs, Create Key Pair, type a name, and then choose Create.

create key pair

Creating a key pair

Amazon EC2 uses public-key cryptography to encrypt and decrypt login information. To be able to log in to your instances, you must create a key pair. With Windows instances, we use the key pair to obtain the administrator password via the Amazon EC2 console and then log in using Remote Desktop Protocol (RDP) as explained in the step-by-step instructions in the Amazon Elastic Compute Cloud User Guide. On Linux, we use the key pair to authenticate SSH login.

If necessary, request a service limit increase for the Amazon EC2 m4.large instance type. You might need to do this if you already have an existing deployment that uses this instance type, and you think you might exceed the default limit with this reference deployment.

Step 2. Launch the stack

You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using this Quick Start. For full details, see the pricing pages for each AWS service you will be using in this Quick Start. Prices are subject to change.

Choose one of the following options to launch the AWS CloudFormation template into your AWS account. For help choosing an option, see deployment options earlier in this guide.

Option 1 - Deploy Apache Cassandra into a new VPC on AWS Option 2 - Deploy Apache Cassandra into an existing VPC on AWS

Launch

Launch

If you're deploying Apache Cassandra into an existing VPC, make sure that your VPC has three private subnets in different Availability Zones for the node instances. These subnets require NAT gateways or NAT instances in their route tables, to allow the instances to download packages and software without exposing them to the Internet. You'll also need the domain name option configured in the DHCP options, as explained in the Amazon VPC documentation. You'll be prompted for your VPC settings when you launch the Quick Start.

Each deployment takes about 10-15 minutes to complete, depending on the size of the Apache Cassandra cluster to deploy.

  1. Check the region that's displayed in the upper-right corner of the navigation bar, and change it if necessary. This is where the network infrastructure for Apache Cassandra will be built. The template is launched in the US East (Ohio) Region by default.

  2. On the Select Template page, keep the default setting for the template URL, and then choose Next.

  3. On the Specify Details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. When you finish reviewing and customizing the parameters, choose Next

In the following tables, parameters are listed by category and described separately for the two deployment options. Parameters for deploying Apache Cassandra into a new VPC and Parameters for deploying Apache Cassandra into an existing VPC

Parameters - Option 1

  • Deploying Apache Cassandra into a new VPC

VPC Network Configuration

Parameter label (name) Default Description

Availability Zones (AvailabilityZones)

Requires input

The list of Availability Zones to use for the subnets in the VPC. The Quick Start uses three Availability Zones from your list and preserves the logical order you specify.

CIDR for the new VPC (VPCCIDR)

10.0.0.0/16

CIDR block for the VPC.

Private Subnet 1 CIDR (PrivateSubnet1CIDR)

10.0.0.0/19

CIDR block for private subnet 1 located in Availability Zone 1.

Private Subnet 2 CIDR (PrivateSubnet2CIDR)

10.0.32.0/19

CIDR block for private subnet 2 located in Availability Zone 2.

Private Subnet 3 CIDR (PrivateSubnet3CIDR)

10.0.64.0/19

CIDR block for private subnet 3 located in Availability Zone 3.

Public Subnet 1 CIDR (PublicSubnet1CIDR)

10.0.128.0/20

CIDR block for the public (DMZ) subnet 1 located in Availability Zone 1.

Public Subnet 2 CIDR (PublicSubnet2CIDR)

10.0.144.0/20

CIDR block for the public (DMZ) subnet 2 located in Availability Zone 2.

Public Subnet 3 CIDR (PublicSubnet3CIDR)

10.0.160.0/20

CIDR block for the public (DMZ) subnet 3 located in Availability Zone 3.

Apache Cassandra Cluster/Nodes Configuration

Parameter label (name) Default Description

Apache Cassandra Version (OSSVersion)

4.0

Apache Cassandra version to install.

Number of nodes to install (ClusterSize)

3

Choose from "1 - 32" for number of cassandra nodes to install.

Cluster Name (ClusterName)

Cassandra-Cluster

The name of the Apache Cassandra cluster.

Data Center Name (DatacenterName)

OSS-dc0

Name of the Apache Cassandra data center.

Node Instance Type (NodeInstanceType)

m4.large

EC2 instance type for Apache Cassandra nodes.

Node Volume Size (NodeVolumeSize)

512

EBS volume size of the Apache Cassandra Cluster Nodes in GB.

Cassandra Cluster Access

Parameter label (name) Default Description

Create Cluster in Public Subnet (CreateClusterWithPublicIP)

false

Should create the Apache Cassandra Cluster nodes in public subnet.

Permitted IP range (RemoteAccessCIDR)

Requires input

The CIDR IP range that is permitted to SSH to the DevOps EC2 instance for the console. We recommend that you set this value to a trusted IP range. For example, you might want to grant only your corporate network access to the software.

Key Name (KeyPairName)

Requires input

Public/private key pair, which allows you to connect securely to your instance after it launches. When you created an AWS account, this is the key pair you created in your preferred region.

DevOps/Bastion Configuration

Parameter label (name) Default Description

Create Dev/Bastion Instance (CreateDevInstance)

true

Whether to create the jumpbox.

Instance Type (DevInstanceType)

t3.medium

EC2 instance type for the DevOps Host.

Volume Sizes (DevVolumeSize)

16

The EBS volume size, in GiB, for the DevOps Host.

AWS Quick Start Configuration

Parameter label (name) Default Description

Quick Start S3 Bucket Name (S3BucketName)

aws-quickstart

S3 bucket where the Quick Start templates and scripts are installed. Use this parameter to specify the S3 bucket name you've created for your copy of Quick Start assets, if you decide to customize or extend the Quick Start for your own use. The bucket name can include numbers, lowercase letters, uppercase letters, and hyphens, but should not start or end with a hyphen.

Quick Start S3 Key Prefix (S3KeyPrefix)

quickstart-datastax-oss

The S3 key name prefix used to simulate a folder for your copy of Quick Start assets, if you decide to customize or extend the Quick Start for your own use. This prefix can include numbers, lowercase letters, uppercase letters, hyphens, and forward slashes.

Parameters - Option 2

  • Option 2: Deploying Apache Cassandra into an existing VPC

VPC Network Configuration

Parameter label (name) Default Description

VPC ID (VPCId)

Requires input

Choose an existing VPC ID to deploy the cluster into.

CIDR for the VPC (VPCCIDR)

10.0.0.0/16

CIDR block for the VPC.

Private Subnet 1 ID (PrivateSubnet1ID)

Requires input

Subnet ID for private subnet 1 located in Availability Zone 1.

Private Subnet 2 ID (PrivateSubnet2ID)

Requires input

Subnet ID for private subnet 2 located in Availability Zone 2.

Private Subnet 3 ID (PrivateSubnet3ID)

Requires input

Subnet ID for private subnet 3 located in Availability Zone 3.

Public Subnet 1 ID (PublicSubnet1CIDR)

Requires input

Subnet ID for the public (DMZ) subnet 1 located in Availability Zone 1.

Public Subnet 2 ID (PublicSubnet2CIDR)

Requires input

Subnet ID for the public (DMZ) subnet 2 located in Availability Zone 2.

Public Subnet 3 ID (PublicSubnet3CIDR)

Requires input

Subnet ID for the public (DMZ) subnet 3 located in Availability Zone 3.

Apache Cassandra Cluster/Nodes Configuration

Parameter label (name) Default Description

Apache Cassandra Version (OSSVersion)

4.0

Apache Cassandra version to install.

Number of nodes to install (ClusterSize)

3

Choose from "1 - 32" for number of cassandra nodes to install.

Cluster Name (ClusterName)

Cassandra-Cluster

The name of the Apache Cassandra cluster.

Data Center Name (DatacenterName)

OSS-dc0

Name of the Apache Cassandra data center.

Node Instance Type (NodeInstanceType)

m4.large

EC2 instance type for Apache Cassandra nodes.

Node Volume Size (NodeVolumeSize)

512

EBS volume size of the Apache Cassandra Cluster Nodes in GB.

Cassandra Cluster Access:

Parameter label (name) Default Description

Create Cluster in Public Subnet (CreateClusterWithPublicIP)

false

Should create the Apache Cassandra Cluster nodes in public subnet.

Permitted IP range (RemoteAccessCIDR)

Requires input

The CIDR IP range that is permitted to SSH to the DevOps EC2 instance for the console. We recommend that you set this value to a trusted IP range. For example, you might want to grant only your corporate network access to the software.

Key Name (KeyPairName)

Requires input

Public/private key pair, which allows you to connect securely to your instance after it launches. When you created an AWS account, this is the key pair you created in your preferred region.

DevOps/Bastion Configuration

Parameter label (name) Default Description

Create Dev/Bastion Instance (CreateDevInstance)

true

Whether to create the jumpbox.

Instance Type (DevInstanceType)

t3.medium

EC2 instance type for the DevOps Host.

Volume Sizes (DevVolumeSize)

16

The EBS volume size, in GiB, for the DevOps Host.

AWS Quick Start Configuration

Parameter label (name) Default Description

Quick Start S3 Bucket Name (S3BucketName)

aws-quickstart

S3 bucket where the Quick Start templates and scripts are installed. Use this parameter to specify the S3 bucket name you've created for your copy of Quick Start assets, if you decide to customize or extend the Quick Start for your own use. The bucket name can include numbers, lowercase letters, uppercase letters, and hyphens, but should not start or end with a hyphen.

Quick Start S3 Key Prefix (S3KeyPrefix)

quickstart-datastax-oss

The S3 key name prefix used to simulate a folder for your copy of Quick Start assets, if you decide to customize or extend the Quick Start for your own use. This prefix can include numbers, lowercase letters, uppercase letters, hyphens, and forward slashes.

Step 3. Test the Deployment

After you deploy the Apache Cassandra cluster, the quickest way to begin using the cluster is to use SSH to connect to the DevOps instance and then to one of the node instances. You can use SSH agent forwarding using the key pair (replacing the KEY_FILE and DevIpAddress values for those of your cluster):

You can get the ip addresses of the nodes from the output tab of the stack.

stack-output

Stack Output

ssh -i $KEY_FILE ubuntu@$DevIpAddress

You can get the Seed1PrivateIpAddress from the output tab of the stack.

seed1-private-ip

Seed1 IP

Once logged into the DevOps instance, run

ssh -i $KEY_FILE ubuntu@$Seed1PrivateIpAddress

If you chose to create the cluster in the public subnet, you can skip the above steps and ssh into one of the nodes with the public ip in the output tab.

You can then view the status of the Apache Cassandra Cluster:

~$ nodetool status

For a 6 node cluster, the nodetool status should be like

nodetool-status

NodeTool Status

The Developer resource web site is accessible at DevUrl shown in the outputs.

Step 4. Back Up Your Data

Apache Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory. You can take a snapshot of all keyspaces, a single keyspace, or a single table while the system is online. See Backing up and Restoring data. For storing the backups in AWS S3, see Backup to S3

Troubleshooting

Q. I encountered a CREATE\_FAILED error when I launched the Quick Start. What should I do?

A. If AWS CloudFormation fails to create the stack, we recommend that you relaunch the template with Rollback on failure set to No. (This setting is under Advanced in the AWS CloudFormation console, Options page.) With this setting, the stack's state will be retained and the instance will be left running, so you can troubleshoot the issue. (You'll want to look at the log files in %ProgramFiles%\Amazon\EC2ConfigService and C:\cfn\log.)

When you set Rollback on failure to No , you'll continue to incur AWS charges for this stack. Please make sure to delete the stack when you've finished troubleshooting.

For additional information, see Troubleshooting AWS CloudFormation on the AWS website or contact us on the AWS Quick Start Discussion Forum.

Q. I encountered a size limitation error when I deployed the AWS Cloudformation templates.

A. We recommend that you launch the Quick Start templates from the location we've provided or from another S3 bucket. If you deploy the templates from a local copy on your computer, you might encounter template size limitations when you create the stack. For more information about AWS CloudFormation limits, see the AWS documentation.

Additional Resources

AWS services

Apache Cassandra

Quick Start reference deployments

GitHub Repository

Send Us Feedback

You can visit our GitHub repository to download the templates and scripts for this Quick Start, to post your comments, and to share your customizations with others.

See releases in git TIP: See releases in git https://github.com/aws-quickstart/quickstart-datastax-oss/releases

Document Revisions

Date Change In sections

Jun 2020

Initial publication





© 2020, Amazon Web Services, Inc. or its affiliates, and DataStax, Inc. All rights reserved.

Notices

This document is provided for informational purposes only. It represents AWS’s current product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS’s products or services, each of which is provided “as is” without warranty of any kind, whether express or implied. This document does not create any warranties, representations, contractual commitments, conditions or assurances from AWS, its affiliates, suppliers or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at [http://aws.amazon.com/apache2.0/](http://aws.amazon.com/apache2.0/) or in the "license" file accompanying this file. This code is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.