简体   繁体   English

grep regex:从目录中的所有文件中提取模式

[英]grep regex : extract pattern from all files in a directory

Lets say a directory has two files. 可以说一个目录有两个文件。 Here are the contents 这是内容

File1.txt File1.txt

tagstart random string tagend

tagstart random string tagend

File2.txt File2.txt

tagstart random string tagend

tagstart random string tagend

I want to grep the directory and extract the lines that have the following pattern 我想grep目录并提取具有以下模式的行

tagstart <any string> tagend

I also want to pipe the output to another file. 我也想将输出传递到另一个文件。 Basically the grep command will result in an output file like this 基本上,grep命令将产生这样的输出文件

out.txt out.txt

tagstart random string tagend

tagstart random string tagend

tagstart random string tagend

tagstart random string tagend

file1.txt: file1.txt:

# This is the file nr.1
tagstart 123 tagend
tagstart abc tagend
kill tagstart def tagend kenny

file2.txt: file2.txt:

# This is the file nr.2
tagstart 123 tagend
tagstart abc tagend
kill tagstart xxx tagend kenny

This command will extract the tags and their enclosed strings: 此命令将提取标签及其包含的字符串:

 cat file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" > output.txt

output.txt: output.txt:

tagstart 123 tagend
tagstart abc tagend
tagstart def tagend
tagstart 123 tagend
tagstart abc tagend
tagstart xxx tagend

Extra cookie for your pleasure: 额外的cookie,供您娱乐:

This command will do something similar, but will display only sorted unique records, and they occurrences (for statistics purpose): 此命令将执行类似的操作,但将仅显示排序的唯一记录,以及它们的出现(出于统计目的):

 sort file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" | uniq -c | \
 awk '{print $2" "$3" "$4" : "$1}' > output.txt

output.txt: output.txt:

tagstart 123 tagend : 2
tagstart abc tagend : 2
tagstart def tagend : 1
tagstart xxx tagend : 1
grep 'tagstart random string tagend' file1.txt file2.txt > out.txt

Regexes are rarely a good way to parse xml. 正则表达式很少是解析xml的好方法。 Have you thought about situations like tagstart one tagstart two tagend one tagend ? 您是否考虑过诸如tagstart one tagstart two tagend one tagend

tagstart one tagstart two tagend one tagend
or 要么
tagstart one tagstart two tagend
or 要么
tagstart two tagend
or 要么
tagstart two tagend one tagend
all satisfy your criteria. 都符合您的条件。 Which of these do you want? 您想要哪一个?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM