Skip to main content

HDFS Commands

Shell Commands

There are two types of shell commands:

  1. User Commands
    • hdfs dfs – runs filesystem commands on the HDFS
    • hdfs fsck – runs a HDFS filesystem checking command
  2. Administration Commands
    • hdfs dfsadmin – runs HDFS administration commands

The generic command line syntax is:

hdfs command [genericOptions] [commandOptions]

User Commands

List directory contents

hdfs dfs –ls
hdfs dfs -ls /
hdfs dfs -ls -R /var

Display the disk space used by files

hdfs dfs -du -h /
hdfs dfs -du /hbase/data/hbase/namespace/
hdfs dfs -du -h /hbase/data/hbase/namespace/
hdfs dfs -du -s /hbase/data/hbase/namespace/

Copy data to HDFS

hdfs dfs -mkdir tdata
hdfs dfs -ls
hdfs dfs -copyFromLocal tutorials/data/geneva.csv tdata
hdfs dfs -ls –R

Copy the file back to local filesystem

cd tutorials/data/
hdfs dfs –copyToLocal tdata/geneva.csv geneva.csv.hdfs
md5sum geneva.csv geneva.csv.hdfs

List acl for a file

hdfs dfs -getfacl tdata/geneva.csv

List the file statistics – (%r – replication factor)

hdfs dfs -stat "%r" tdata/geneva.csv

Write to hdfs reading from stdin

echo "blah blah blah" | hdfs dfs -put - tdataset/tfile.txt
hdfs dfs -ls –R
hdfs dfs -cat tdataset/tfile.txt

Removing a file

hdfs dfs -rm tdataset/tfile.txt
hdfs dfs -ls –R

List the blocks of a file and their locations

hdfs fsck /user/cloudera/tdata/geneva.csv -
files -blocks –locations

Print missing blocks and the files they belong to

hdfs fsck / -list-corruptfileblocks

Adminstration Commands

Comprehensive status report of HDFS cluster

hdfs dfsadmin –report

Prints a tree of racks and their nodes

hdfs dfsadmin –printTopology

Get the information for a given datanode (like ping)

hdfs dfsadmin -getDatanodeInfo localhost:50020

Get a list of namenodes in the Hadoop cluster

hdfs getconf –namenodes

Dump the NameNode fsimage to XML file

cd /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current
hdfs oiv -i fsimage_0000000000000003388 -o
/tmp/fsimage.xml -p XML