Hadoop Commands

Quick Hadoop commands

Do File system check in Hadoop HDFS for fixing corrupted data

sudo -u hdfs -c hdfs fsck /

List HDFS

hadoop fs -ls

List HDFS file based on date it created or file name

hadoop fs -ls <hdfs_path> | grep 'YYYY-MM-DD'
eg: 
hadoop fs -ls /mydata/myfiles | grep '2017-12-01'
hadoop fs -ls /mydata/myfiles | grep '_employee'

Move/Copy path from one location to another

hadoop fs -mv <hdfs_source_path> <hdfs_destination_path>
hadoop fs -cp <hdfs_source_path> <hdfs_destination_path>

Show content of the HDFS file

hadoop fs -cat <hdfs_file>

Find files based on text inside the file

Need few lines to make this work in HDFS file system. Create a .sh file using below script

touch grepHDFS.sh
chmod 777 grepHDFS.sh
vi grepHDFS.sh

Copy below shellscript to grepHDFS.sh

#!/bin/bash
hadoop fs -ls /myhdfs/mypath | awk '{print $8}' | \
while read f
do
   hadoop fs -cat $f | grep -i $1
done

Usage like below

./grepHDFS.sh <text to search inside files>
./grepHDFS.sh  emp_id

 

Filter and Find files based on text inside the file

This will filter files based on filter file text and then search text inside those filtered files. Need few lines to make this work in HDFS file system. Create a .sh file using below script

touch grepHDFS.sh
chmod 777 grepHDFS.sh
vi grepHDFS.sh

Copy below shellscript to grepHDFS.sh

#!/bin/bash
hadoop fs -ls /myhdfs/mypath | grep $1  | awk '{print $8}' | \
while read f
do
   hadoop fs -cat $f | grep -i $2
done

Usage like below

./grepHDFS.sh <filter text for files list> <text to search inside files>
./grepHDFS.sh _employee emp_id

Leave a Reply

Your email address will not be published. Required fields are marked *