I am using Linux. I have a directory of many files, I want to use grep
, tail
and wildcard expansion *
in tandem to print the last occurrence of <pattern> in each file:
Input: <some command>
Expected Output:
<last occurrence of pattern in file 1>
<last occurrence of pattern in file 2>
...
<last occurrence of pattern in file N>
What I am trying now is grep "pattern" * | tail -n 1
grep "pattern" * | tail -n 1
but the output contains only one line, which is the last occurrence of pattern in the last file. I assume the reason is because the *
wildcard expansion happens before pipelining of commands, so the tail
runs only once.
Does there exist some Bash syntax so that I can achieve the expected outcome, ie let tail
run for each file?
I've also tried grep -m1 "pattern" <(tac *)
, and it seems like the aforementioned reasoning still applies: wildcard expansion applies to only to the immediate command it is associated with, and the "outer" command runs only once.
Wildcards are expanded on the command line before any command runs. For example if you have files foo
and bar
in your directory and run grep pattern * | tail -n1
grep pattern * | tail -n1
then bash transforms this into grep pattern foo bar | tail -n1
grep pattern foo bar | tail -n1
and runs that. Since there's only one stream of output from grep, there's only one stream of input to tail and it prints the last line of that stream.
If you want to search each file and print the last line of grep's output separately you can use a loop:
for file in * ; do
grep pattern "${file}" | tail -n1
done
The problem with non-loop solutions is that tail
doesn't inherently know where the output of one file ends and the output of another file begins, or indeed that there are even files involved on the other end of the pipe. It just knows input is coming in from somewhere and it has to print the last line of that input. If you didn't want a loop, you'd have to use a more powerful tool like awk
and perhaps use the fact that grep prepends the names of matched files (if multiple files are matched, or with -H
) to delimit the start and end of outputs from each file. But, the work to write an awk
program that keeps track of the current file to know when its output ends and print its last line is probably more effort than is worth when the loop solution is so simple.
You can achieve what you want using xargs . For your example it would be:
ls * | xargs -n 1 sh -c 'grep "pattern" $0 | tail -n 1'
Can save you from having to write a loop.
You can do this with awk
, although (as tjm3772 pointed out in their answer) it's actually more complicated than the shell for
loop. For the record, here's what I came up with:
awk -v pattern="YourPatternHere" '(FNR==1 && line!="") {print line; line=""}; $0~pattern {line=$0}; END {if (line!="") print line}'
Explanation: when it finds a matching line ( $0~pattern
), it stores that line in the line
variable ( {line=$0}
) (this means that at the end of the file, line
will hold the last matching line.
(Note: if you want to just include a literal pattern in the program, remove the -v pattern="YourPatternHere"
part and replace $0~pattern
with just /YourPatternHere/
)
There's no simple trigger to print a match at the end of each file, so that part's split into two pieces: if it's the first line of a file AND line
is set because of a match in the previous file ( (FNR==1 && line!="")
), print line
and then clear it so it's not mistaken for a match in the current file ( {print line; line=""}
). Finally, at the end of the final file ( END
), print a match found in that last file if there was one ( {if (line!="") print line}
).
Also, note that the print-at-beginning-of-new-file test must be before the check for a matching line, or else it'll get very confused if the first line of the new file matches.
So... yeah, a shell for
loop is simpler (and much easier to get right).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.