简体   繁体   English

如何从S3 grep术语和输出对象名称

[英]How to grep a term from S3 and output object name

I need to grep a term over thousands of files in S3, and list those file names in some output file. 我需要grep S3中成千上万个文件的术语,并在一些输出文件中列出这些文件名。 I'm quite new using cli, so I've been testing both on my local, and in a small subset in s3. 我使用cli很新,因此我一直在本地和s3的一小部分中进行测试。

So far I've got this: 到目前为止,我已经知道了:

aws s3 cp s3://mybucket/path/to/file.csv - | grep -iln searchterm > output.txt

The problem with this is with the hyphen. 问题在于连字符。 Since I'm copying over to standard output, the -l switch in grep returns (standard input) instead of file.csv 由于我要复制到标准输出,因此grep中的-l开关返回(标准输入)而不是file.csv

My desired output is 我想要的输出是

file.csv

Eventually, I'll need to iterate this over the whole bucket, and then all buckets, to get 最终,我需要遍历整个存储桶,然后遍历所有存储桶,以获取

file1.csv
file2.csv
file3.csv

But I need to get over this hurdle first. 但是我需要首先克服这个障碍。 Thanks! 谢谢!

Because you print the file in STDOUT and pipe that to grep STDIN, grep has no idea that the original file was file.csv . 因为您在STDOUT中打印文件并将其通过管道传输到grep STDIN,所以grep不知道原始文件是file.csv If you have a long list of files, I would do: 如果文件列表很长,我会这样做:

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | grep -q searchterm && { echo ${file} >> output.txt; }; done < files_list.txt

I cannot try it, because I do not have access to an AWS S3 instance, but the trick is to use grep quietly ( -q ), it will return true if it finds at least a match, false otherwise; 我无法尝试,因为我无权访问AWS S3实例,但是诀窍是安静地使用grep( -q ),如果找到至少一个匹配项,它将返回true,否则返回false;否则,返回false。 Then you can print the name of the file. 然后,您可以打印文件的名称。

EDIT: Explanation 编辑:解释

  1. The while loop will iterate over each line of files_list.txt while循环将遍历files_list.txt每一行
  2. The aws command will print this file in stdout aws命令将在stdout打印此文件
  3. We redirect stdout to grep in quiet mode ( -q ) which acts as a pattern matcher, returning true if a match was found, false ohter wise. 我们将安静模式( -q )中的stdout重定向到grep该模式充当模式匹配器,如果找到匹配项,则返回true,否则返回false。
  4. If grep returns true, we append the name of the file ( ${file} ) to our output file. 如果grep返回true,则将文件名( ${file} )附加到输出文件中。

EDIT2: Other solution EDIT2:其他解决方案

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | sed -n /searchpattern/{F;q} >> output.txt; done < files_list.txt

Explanation 说明

Steps 1 and 2 are the same, then: 步骤1和2相同,然后:

  1. stdout is redirected to sed, which will look in the file line by line until it finds the first stream pattern , and then quit ( q ), printing the file name ( F ) in the output file. stdout重定向到sed,它将逐行查找文件,直到找到第一个stream pattern ,然后退出( q ),在输出文件中打印文件名( F )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM