简体   繁体   中英

grep regex : extract pattern from all files in a directory

Lets say a directory has two files. Here are the contents

File1.txt

tagstart random string tagend

tagstart random string tagend

File2.txt

tagstart random string tagend

tagstart random string tagend

I want to grep the directory and extract the lines that have the following pattern

tagstart <any string> tagend

I also want to pipe the output to another file. Basically the grep command will result in an output file like this

out.txt

tagstart random string tagend

tagstart random string tagend

tagstart random string tagend

tagstart random string tagend

file1.txt:

# This is the file nr.1
tagstart 123 tagend
tagstart abc tagend
kill tagstart def tagend kenny

file2.txt:

# This is the file nr.2
tagstart 123 tagend
tagstart abc tagend
kill tagstart xxx tagend kenny

This command will extract the tags and their enclosed strings:

 cat file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" > output.txt

output.txt:

tagstart 123 tagend
tagstart abc tagend
tagstart def tagend
tagstart 123 tagend
tagstart abc tagend
tagstart xxx tagend

Extra cookie for your pleasure:

This command will do something similar, but will display only sorted unique records, and they occurrences (for statistics purpose):

 sort file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" | uniq -c | \
 awk '{print $2" "$3" "$4" : "$1}' > output.txt

output.txt:

tagstart 123 tagend : 2
tagstart abc tagend : 2
tagstart def tagend : 1
tagstart xxx tagend : 1
grep 'tagstart random string tagend' file1.txt file2.txt > out.txt

Regexes are rarely a good way to parse xml. Have you thought about situations like tagstart one tagstart two tagend one tagend ?

tagstart one tagstart two tagend one tagend
or
tagstart one tagstart two tagend
or
tagstart two tagend
or
tagstart two tagend one tagend
all satisfy your criteria. Which of these do you want?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM