Recently Updated Pages
Pig
Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language...
HBase
HBase is a key-valued row/column store modeled on Google’s Bigtable providing Bigtable-like capab...
Scheduling
By default, Hadoop uses FIFO to schedule jobs. Alternate scheduler options: capacity and fair Cap...
Hadoop key components
Input Splitter Is responsible for splitting your input into multiple chunks (default is 64MB). Th...
MapReduce
Introduction MapReduce is a programming model and an associated implementation for processing an...
Hadoop 2.0
YARN Splits up the two major functions of JobTracker Global Resource Manager - Cluster resourc...
HDFS Commands
Shell Commands There are two types of shell commands: User Commands hdfs dfs – runs filesyste...
HDFS Architecture, Security and Configuration
HDFS Architecture NameNode Manages File System Namespace Maps a file name to a set of blocks M...
Introduction
Software platform to easily process vast amounts of data. The main features are: Scalable: It c...
01 The challange and importance of data wrangling
The step after Data Acquisition and before Analysis in the data management flow is Data Wrangling...
04 Logstash
Working with Beats Beats focus on data collection and shipping while Logstash focuses on processi...
03 Elasticsearch operations
Creating and index: PUT /index_name Define a mapping: PUT /my_index/_mapping { "properties": { ...
01 ELK stack
Kibana: Visualize and Manage Elasticsearch: Store, Search and Analyze Logstash + Beats: Inges...
04 Cassandra Query Language
To query the data stored within Cassandra, a dedicated query language named Cassandra Query Langu...
03 HBase
HBase Table: Split it into multiple regions: replicated across servers. One Store per ColumnFa...
01 Introduction
In the recent years there has been a ever growing need for technologies capable of handling large...
03 Memcache
Memcache is a free & open source, high-performance, distributed memory object caching system that...
02 Redis
Redis is an advanced key-value store, where keys can contain data structures such as strings, has...
01 Introduction
In many applications performance is an essential priority, and often a small delay in response ti...
03 MongoDB Queries
Create Create a database: use database_name Create a collection: db.createCollection(name, option...