Lets say a directory has two files. Here are the contents
File1.txt
tagstart random string tagend
tagstart random string tagend
File2.txt
tagstart random string tagend
tagstart random string tagend
I want to grep the directory and extract the lines that have the following pattern
tagstart <any string> tagend
I also want to pipe the output to another file. Basically the grep command will result in an output file like this
out.txt
tagstart random string tagend
tagstart random string tagend
tagstart random string tagend
tagstart random string tagend
file1.txt:
# This is the file nr.1
tagstart 123 tagend
tagstart abc tagend
kill tagstart def tagend kenny
file2.txt:
# This is the file nr.2
tagstart 123 tagend
tagstart abc tagend
kill tagstart xxx tagend kenny
This command will extract the tags and their enclosed strings:
cat file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" > output.txt
output.txt:
tagstart 123 tagend
tagstart abc tagend
tagstart def tagend
tagstart 123 tagend
tagstart abc tagend
tagstart xxx tagend
Extra cookie for your pleasure:
This command will do something similar, but will display only sorted unique records, and they occurrences (for statistics purpose):
sort file1.txt file2.txt | grep -o -E "tagstart(.*?)tagend" | uniq -c | \
awk '{print $2" "$3" "$4" : "$1}' > output.txt
output.txt:
tagstart 123 tagend : 2
tagstart abc tagend : 2
tagstart def tagend : 1
tagstart xxx tagend : 1
grep 'tagstart random string tagend' file1.txt file2.txt > out.txt
Regexes are rarely a good way to parse xml. Have you thought about situations like tagstart one tagstart two tagend one tagend
?
tagstart one tagstart two tagend one tagend
or
tagstart one tagstart two tagend
or
tagstart two tagend
or
tagstart two tagend one tagend
all satisfy your criteria. Which of these do you want?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.