I need to grep a term over thousands of files in S3, and list those file names in some output file. I'm quite new using cli, so I've been testing both on my local, and in a small subset in s3.
So far I've got this:
aws s3 cp s3://mybucket/path/to/file.csv - | grep -iln searchterm > output.txt
The problem with this is with the hyphen. Since I'm copying over to standard output, the -l switch in grep returns (standard input) instead of file.csv
My desired output is
file.csv
Eventually, I'll need to iterate this over the whole bucket, and then all buckets, to get
file1.csv
file2.csv
file3.csv
But I need to get over this hurdle first. Thanks!
Because you print the file in STDOUT and pipe that to grep STDIN, grep has no idea that the original file was file.csv
. If you have a long list of files, I would do:
while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | grep -q searchterm && { echo ${file} >> output.txt; }; done < files_list.txt
I cannot try it, because I do not have access to an AWS S3 instance, but the trick is to use grep quietly ( -q
), it will return true if it finds at least a match, false otherwise; Then you can print the name of the file.
files_list.txt
aws
command will print this file in stdout
stdout
to grep
in quiet mode ( -q
) which acts as a pattern matcher, returning true if a match was found, false ohter wise. ${file}
) to our output file. while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | sed -n /searchpattern/{F;q} >> output.txt; done < files_list.txt
Steps 1 and 2 are the same, then:
stdout
is redirected to sed, which will look in the file line by line until it finds the first stream pattern
, and then quit ( q
), printing the file name ( F
) in the output file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.