Skip to main content

Recently Updated Pages

Pig

SMBUD - Systems and Methods for Big and... 12-13 Hadoop Subprojects

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language...

Updated 2 years ago by Paolo Basso

HBase

SMBUD - Systems and Methods for Big and... 12-13 Hadoop Subprojects

HBase is a key-valued row/column store modeled on Google’s Bigtable providing Bigtable-like capab...

Updated 2 years ago by Paolo Basso

Scheduling

SMBUD - Systems and Methods for Big and... 11 Hadoop

By default, Hadoop uses FIFO to schedule jobs. Alternate scheduler options: capacity and fair Cap...

Updated 2 years ago by Paolo Basso

Hadoop key components

SMBUD - Systems and Methods for Big and... 11 Hadoop

Input Splitter Is responsible for splitting your input into multiple chunks (default is 64MB). Th...

Updated 2 years ago by Paolo Basso

MapReduce

SMBUD - Systems and Methods for Big and... 11 Hadoop

Introduction MapReduce is a programming model and an associated implementation for processing an...

Updated 2 years ago by Paolo Basso

Hadoop 2.0

SMBUD - Systems and Methods for Big and... 11 Hadoop

YARN Splits up the two major functions of JobTracker Global Resource Manager - Cluster resourc...

Updated 2 years ago by Paolo Basso

HDFS Commands

SMBUD - Systems and Methods for Big and... 11 Hadoop

Shell Commands There are two types of shell commands: User Commands hdfs dfs – runs filesyste...

Updated 2 years ago by Paolo Basso

HDFS Architecture, Security and Configuration

SMBUD - Systems and Methods for Big and... 11 Hadoop

HDFS Architecture NameNode Manages File System Namespace Maps a file name to a set of blocks M...

Updated 2 years ago by Paolo Basso

Introduction

SMBUD - Systems and Methods for Big and... 11 Hadoop

Software platform to easily process vast amounts of data. The main features are: Scalable: It c...

Updated 2 years ago by Paolo Basso

01 The challange and importance of data wrangling

SMBUD - Systems and Methods for Big and... 14 Data Wrangling

The step after Data Acquisition and before Analysis in the data management flow is Data Wrangling...

Updated 2 years ago by Paolo Basso

04 Logstash

SMBUD - Systems and Methods for Big and... 10 IR Based Databases - ELK

Working with Beats Beats focus on data collection and shipping while Logstash focuses on processi...

Updated 2 years ago by Paolo Basso

03 Elasticsearch operations

SMBUD - Systems and Methods for Big and... 10 IR Based Databases - ELK

Creating and index: PUT /index_name Define a mapping: PUT /my_index/_mapping { "properties": { ...

Updated 2 years ago by Paolo Basso

01 ELK stack

SMBUD - Systems and Methods for Big and... 10 IR Based Databases - ELK

Kibana: Visualize and Manage Elasticsearch: Store, Search and Analyze Logstash + Beats: Inges...

Updated 2 years ago by Paolo Basso

04 Cassandra Query Language

SMBUD - Systems and Methods for Big and... 09 Columnar Databases

To query the data stored within Cassandra, a dedicated query language named Cassandra Query Langu...

Updated 2 years ago by Paolo Basso

03 HBase

SMBUD - Systems and Methods for Big and... 09 Columnar Databases

HBase Table: Split it into multiple regions: replicated across servers. One Store per ColumnFa...

Updated 2 years ago by Paolo Basso

01 Introduction

SMBUD - Systems and Methods for Big and... 09 Columnar Databases

In the recent years there has been a ever growing need for technologies capable of handling large...

Updated 2 years ago by Paolo Basso

03 Memcache

SMBUD - Systems and Methods for Big and... 08 Key-value Databases

Memcache is a free & open source, high-performance, distributed memory object caching system that...

Updated 2 years ago by Paolo Basso

02 Redis

SMBUD - Systems and Methods for Big and... 08 Key-value Databases

Redis is an advanced key-value store, where keys can contain data structures such as strings, has...

Updated 2 years ago by Paolo Basso

01 Introduction

SMBUD - Systems and Methods for Big and... 08 Key-value Databases

In many applications performance is an essential priority, and often a small delay in response ti...

Updated 2 years ago by Paolo Basso

03 MongoDB Queries

SMBUD - Systems and Methods for Big and... 07 Document Databases

Create Create a database: use database_name Create a collection: db.createCollection(name, option...

Updated 2 years ago by Paolo Basso