11 Hadoop
Slides: https://webeep.polimi.it/mod/resource/view.php?id=52305
Introduction
Software platform to easily process vast amounts of data. The main features are: Scalable: It c...
HDFS Architecture, Security and Configuration
HDFS Architecture NameNode Manages File System Namespace Maps a file name to a set of blocks M...
HDFS Commands
Shell Commands There are two types of shell commands: User Commands hdfs dfs – runs filesyste...
Hadoop 2.0
YARN Splits up the two major functions of JobTracker Global Resource Manager - Cluster resourc...
MapReduce
Introduction MapReduce is a programming model and an associated implementation for processing an...
Hadoop key components
Input Splitter Is responsible for splitting your input into multiple chunks (default is 64MB). Th...
Scheduling
By default, Hadoop uses FIFO to schedule jobs. Alternate scheduler options: capacity and fair Cap...