HDFS Commands
Shell Commands
There are two types of shell commands:
-
User Commands
-
hdfs dfs
– runs filesystem commands on the HDFS -
hdfs fsck
– runs a HDFS filesystem checking command
-
-
Administration Commands
-
hdfs dfsadmin
– runs HDFS administration commands
-
The generic command line syntax is:
hdfs command [genericOptions] [commandOptions]
User Commands
List directory contents
hdfs dfs –ls
hdfs dfs -ls /
hdfs dfs -ls -R /var
Display the disk space used by files
hdfs dfs -du -h /
hdfs dfs -du /hbase/data/hbase/namespace/
hdfs dfs -du -h /hbase/data/hbase/namespace/
hdfs dfs -du -s /hbase/data/hbase/namespace/
Copy data to HDFS
hdfs dfs -mkdir tdata
hdfs dfs -ls
hdfs dfs -copyFromLocal tutorials/data/geneva.csv tdata
hdfs dfs -ls –R
Copy the file back to local filesystem
cd tutorials/data/
hdfs dfs –copyToLocal tdata/geneva.csv geneva.csv.hdfs
md5sum geneva.csv geneva.csv.hdfs
List acl for a file
hdfs dfs -getfacl tdata/geneva.csv
List the file statistics – (%r – replication factor)
hdfs dfs -stat "%r" tdata/geneva.csv
Write to hdfs reading from stdin
echo "blah blah blah" | hdfs dfs -put - tdataset/tfile.txt
hdfs dfs -ls –R
hdfs dfs -cat tdataset/tfile.txt
Removing a file
hdfs dfs -rm tdataset/tfile.txt
hdfs dfs -ls –R
List the blocks of a file and their locations
hdfs fsck /user/cloudera/tdata/geneva.csv -
files -blocks –locations
Print missing blocks and the files they belong to
hdfs fsck / -list-corruptfileblocks
Adminstration Commands
Comprehensive status report of HDFS cluster
hdfs dfsadmin –report
Prints a tree of racks and their nodes
hdfs dfsadmin –printTopology
Get the information for a given datanode (like ping)
hdfs dfsadmin -getDatanodeInfo localhost:50020
Get a list of namenodes in the Hadoop cluster
hdfs getconf –namenodes
Dump the NameNode fsimage to XML file
cd /var/lib/hadoop-hdfs/cache/hdfs/dfs/name/current
hdfs oiv -i fsimage_0000000000000003388 -o
/tmp/fsimage.xml -p XML
No Comments