Quick Hadoop commands
Do File system check in Hadoop HDFS for fixing corrupted data
sudo -u hdfs -c hdfs fsck /
List HDFS
hadoop fs -ls
List HDFS file based on date it created or file name
hadoop fs -ls <hdfs_path> | grep 'YYYY-MM-DD' eg: hadoop fs -ls /mydata/myfiles | grep '2017-12-01'
hadoop fs -ls /mydata/myfiles | grep '_employee'
Move/Copy path from one location to another
hadoop fs -mv <hdfs_source_path> <hdfs_destination_path> hadoop fs -cp <hdfs_source_path> <hdfs_destination_path>
Show content of the HDFS file
hadoop fs -cat <hdfs_file>
Find files based on text inside the file
Need few lines to make this work in HDFS file system. Create a .sh file using below script
touch grepHDFS.sh chmod 777 grepHDFS.sh vi grepHDFS.sh
Copy below shellscript to grepHDFS.sh
#!/bin/bash hadoop fs -ls /myhdfs/mypath | awk '{print $8}' | \ while read f do hadoop fs -cat $f | grep -i $1 done
Usage like below
./grepHDFS.sh <text to search inside files> ./grepHDFS.sh emp_id
Filter and Find files based on text inside the file
This will filter files based on filter file text and then search text inside those filtered files. Need few lines to make this work in HDFS file system. Create a .sh file using below script
touch grepHDFS.sh chmod 777 grepHDFS.sh vi grepHDFS.sh
Copy below shellscript to grepHDFS.sh
#!/bin/bash hadoop fs -ls /myhdfs/mypath | grep $1 | awk '{print $8}' | \ while read f do hadoop fs -cat $f | grep -i $2 done
Usage like below
./grepHDFS.sh <filter text for files list> <text to search inside files> ./grepHDFS.sh _employee emp_id