04 Cassandra Query Language

To query the data stored within Cassandra, a dedicated query language named Cassandra Query Language (CQL) was developed.

CQL offers a model similar to MySQL under many different aspects

It is used to query data stored in tables
Each table is made by rows and columns
Most of the operators are the ones used in MySQL

CQL commands and queries can either be run in the console or by reading a textual file with the corresponding command.

Keyspace

CREATE KEYSPACE population
WITH replication = {‘class’: ‘SimpleStrategy’,
                    ‘replication_factor’: 3};

The DESCRIBE command can be used to check whether a keyspace (or a table) has been correctly created. It can also be applied to other elements.

DESCRIBE keyspaces;

To be able to perform the operations on the tables (that we still have to create), we must choose in which keyspace we want to work. The command USE covers such need.

USE population;

Keyspaces can be also modified (ALTER) and deleted (DROP) with the corresponding commands.

ALTER KEYSPACE <identifier> WITH <properties>;

DROP KEYSPACE <identifier>;

Tables

CREATE TABLE <table_name> (
  <column_name> <column_type>,
  <column_name> <column_type>,
  ...
)

Optionally, some options can be included by using WITH <options>.

CREATE TABLE person (
  personal_id text,
  name text,
  age varint,
  birth_date text,
  gender text,
  PRIMARY KEY (personal_id, text)
);

DESCRIBE tables;
DESCRIBE person;

When creating the PRIMARY KEY of the table as the last definition within the CREATE TABLE operation, the columns that you put within the PRIMARY KEY statement have different meaning depending on the order and the brackets.

The first value (or set of values) is named Partition Key(s). It defines the way in which the data is partitioned within the cassandra nodes. The second value (or sets of values) is named Clustering Key(s). It is used to define the way in which the data is stored within a partition. A table can employ many different Clustering and/or Partition Keys.

When creating a table, clustering keys can be used to define an ordering.

CREATE TABLE person (...)
WITH CLUSTERING ORDER BY (text ASC, ...);

Tables can be also modified through the ALTER command:

ALTER TABLE <table_name> <instructions>;
ALTER TABLE <table_name> ADD <column_name> <column_type>;
ALTER TABLE <table_name> DROP <column_name>;

Tables can be also deleted through the DROP command:

DROP TABLE <table_name>;

Rather than deleting the table, it is possible to empty it through the TRUNCATE command:

TRUNCATE TABLE <table_name>;

Indexes

Indexes are one of the most important elements of a table in Cassandra. They allow to query the column efficiently.

Secondary Indexes are created with the following command:

CREATE INDEX <identifier>
ON <table_name> (<column_name>);

CREATE INDEX person_name
ON person (name);

DROP INDEX index_name

Data

Insert:

INSERT INTO <tablename>(<column_name1>,
<column_name2>, ...)
VALUES (<column_value1>, <column_value2>....)
USING <option>;

INSERT INTO person(personal_id, address, age,
birth_date, gender, name)
VALUES (‘FRNTRZ95E12F675T’, ‘Via Milano 12’,
26, ‘12-05-1995’, ‘Male’, ‘Francesco Terzani’);

Select:

SELECT <field_list>
FROM <table_name>
WHERE <conditions>

SELECT *
FROM person
WHERE personal_id = ‘FRNTRZ95E12F675T’

Being Cassandra a column-oriented database, all the operations are optimized to extract data from columns. To solve this issue, it’s necessary to query with respect to the attributes included in the primary key or to create a secondary index.

Update:

UPDATE <table_name>
SET <column_name> = <new_value>, ...
WHERE <condition>;

UPDATE person
SET address = ‘Via Milani 13’
WHERE personal_id = ‘FRNTRZ95E12F675T’;

Delete (only on primary key):

DELETE
FROM <table_name>
WHERE <condition>;

DELETE
FROM person
WHERE personal_id = ‘FRNTRZ95E12F675T’;

Batch:

BEGIN BATCH
<insert_statement>;
<update_statement>;
<delete_statement>;
APPLY BATCH;

Utilities

The CAPTURE command followed by the path of the folder in which store the results and the name of the file.

CAPTURE D:/Program Files/Cassandra/Outputs/output.txt;

CAPTURE off;

The EXPAND command provides extended outputs within the console when performing queries. It must be executed before the query to enable it.

EXPAND on;

EXPAND off;

The SOURCE command allows you to run queries from textual files. The command accepts the path to the file with the query.

SOURCE D:/Program Files/Cassandra/Queries/query_1.txt;

Data Types

Cassandra supports many different data types, like text, varint, float, double, Boolean, etc.

In particular, it supports two particular data types

collections
user-defined data types

Collections are pretty easy to define and update:

CREATE TABLE test(email list<text>, ...);
UPDATE test SET email = email + [...] WHERE ...;

To create a user-defined data type:

CREATE TYPE <type_name> (
  <column_definition>
  ...
);
DESCRIBE TYPE <type_name>;

01 The Data-driven Virtuous Cycle

02 What is Big Data and new ways to solve problems

03 Data-driven decisions

01 ER

02 Relational Model

03 ER Exercises

01 Introduction to API

02 RESTful API

03 Scraping

01 NoSQL General Concepts

02 Transactional Properties in NoSQL

03 Brief NoSQL history

04 A map of NoSQL technologies

01 Graph Theory

02 Graph Databases

03 Neo4J

Exam Questions

01 Introduction

02 MongoDB

03 MongoDB Queries

Exam questions

01 Introduction

02 Redis

03 Memcache

Exam questions

01 Introduction

02 Cassandra

03 HBase

04 Cassandra Query Language

01 ELK stack

02 Elasticsearch

03 Elasticsearch operations

04 Logstash

Introduction

HDFS Architecture, Security and Configuration

HDFS Commands

Hadoop 2.0

MapReduce

Hadoop key components

Scheduling

HBase

Pig

Hive

Impala

Storm

Flume

Sqoop

01 The challange and importance of data wrangling

02 Data Wrangling Process

04 Cassandra Query Language

Keyspace

Tables

Indexes

Data

Utilities

Data Types

No Comments