04 Cassandra Query Language
To query the data stored within Cassandra, a dedicated query language named Cassandra Query Language (CQL) was developed.
CQL offers a model similar to MySQL under many different aspects
- It is used to query data stored in tables
- Each table is made by rows and columns
- Most of the operators are the ones used in MySQL
CQL commands and queries can either be run in the console or by reading a textual file with the corresponding command.
Keyspace
CREATE KEYSPACE population
WITH replication = {‘class’: ‘SimpleStrategy’,
‘replication_factor’: 3};
The DESCRIBE command can be used to check whether a keyspace (or a table) has been correctly created. It can also be applied to other elements.
DESCRIBE keyspaces;
To be able to perform the operations on the tables (that we still have to create), we must choose in which keyspace we want to work. The command USE covers such need.
USE population;
Keyspaces can be also modified (ALTER) and deleted (DROP) with the corresponding commands.
ALTER KEYSPACE <identifier> WITH <properties>;
DROP KEYSPACE <identifier>;
Tables
CREATE TABLE <table_name> (
<column_name> <column_type>,
<column_name> <column_type>,
...
)
Optionally, some options can be included by using WITH <options>.
CREATE TABLE person (
personal_id text,
name text,
age varint,
birth_date text,
gender text,
PRIMARY KEY (personal_id, text)
);
DESCRIBE tables;
DESCRIBE person;
When creating the PRIMARY KEY of the table as the last definition within the CREATE TABLE operation, the columns that you put within the PRIMARY KEY statement have different meaning depending on the order and the brackets.
The first value (or set of values) is named Partition Key(s). It defines the way in which the data is partitioned within the cassandra nodes. The second value (or sets of values) is named Clustering Key(s). It is used to define the way in which the data is stored within a partition. A table can employ many different Clustering and/or Partition Keys.
When creating a table, clustering keys can be used to define an ordering.
CREATE TABLE person (...)
WITH CLUSTERING ORDER BY (text ASC, ...);
Tables can be also modified through the ALTER command:
ALTER TABLE <table_name> <instructions>;
ALTER TABLE <table_name> ADD <column_name> <column_type>;
ALTER TABLE <table_name> DROP <column_name>;
Tables can be also deleted through the DROP command:
DROP TABLE <table_name>;
Rather than deleting the table, it is possible to empty it through the TRUNCATE command:
TRUNCATE TABLE <table_name>;
Indexes
Indexes are one of the most important elements of a table in Cassandra. They allow to query the column efficiently.
Secondary Indexes are created with the following command:
CREATE INDEX <identifier>
ON <table_name> (<column_name>);
CREATE INDEX person_name
ON person (name);
DROP INDEX index_name
Data
Insert:
INSERT INTO <tablename>(<column_name1>,
<column_name2>, ...)
VALUES (<column_value1>, <column_value2>....)
USING <option>;
INSERT INTO person(personal_id, address, age,
birth_date, gender, name)
VALUES (‘FRNTRZ95E12F675T’, ‘Via Milano 12’,
26, ‘12-05-1995’, ‘Male’, ‘Francesco Terzani’);
Select:
SELECT <field_list>
FROM <table_name>
WHERE <conditions>
SELECT *
FROM person
WHERE personal_id = ‘FRNTRZ95E12F675T’
Being Cassandra a column-oriented database, all the operations are optimized to extract data from columns. To solve this issue, it’s necessary to query with respect to the attributes included in the primary key or to create a secondary index.
Update:
UPDATE <table_name>
SET <column_name> = <new_value>, ...
WHERE <condition>;
UPDATE person
SET address = ‘Via Milani 13’
WHERE personal_id = ‘FRNTRZ95E12F675T’;
Delete (only on primary key):
DELETE
FROM <table_name>
WHERE <condition>;
DELETE
FROM person
WHERE personal_id = ‘FRNTRZ95E12F675T’;
Batch:
BEGIN BATCH
<insert_statement>;
<update_statement>;
<delete_statement>;
APPLY BATCH;
Utilities
The CAPTURE command followed by the path of the folder in which store the results and the name of the file.
CAPTURE D:/Program Files/Cassandra/Outputs/output.txt;
CAPTURE off;
The EXPAND command provides extended outputs within the console when performing queries. It must be executed before the query to enable it.
EXPAND on;
EXPAND off;
The SOURCE command allows you to run queries from textual files. The command accepts the path to the file with the query.
SOURCE D:/Program Files/Cassandra/Queries/query_1.txt;
Data Types
Cassandra supports many different data types, like text, varint, float, double, Boolean, etc.
In particular, it supports two particular data types
- collections
- user-defined data types
Collections are pretty easy to define and update:
CREATE TABLE test(email list<text>, ...);
UPDATE test SET email = email + [...] WHERE ...;
To create a user-defined data type:
CREATE TYPE <type_name> (
<column_definition>
...
);
DESCRIBE TYPE <type_name>;
No Comments