简体   繁体   中英

Can I do a Bash wildcard expansion (*) on an entire pipeline of commands?

I am using Linux. I have a directory of many files, I want to use grep , tail and wildcard expansion * in tandem to print the last occurrence of <pattern> in each file:

Input: <some command>
Expected Output: 
<last occurrence of pattern in file 1>
<last occurrence of pattern in file 2>
...
<last occurrence of pattern in file N>

What I am trying now is grep "pattern" * | tail -n 1 grep "pattern" * | tail -n 1 but the output contains only one line, which is the last occurrence of pattern in the last file. I assume the reason is because the * wildcard expansion happens before pipelining of commands, so the tail runs only once.

Does there exist some Bash syntax so that I can achieve the expected outcome, ie let tail run for each file?

  • I know I can always use a for-loop to solve the problem. I'm just curious if the problem can be solved with a more condensed command.

I've also tried grep -m1 "pattern" <(tac *) , and it seems like the aforementioned reasoning still applies: wildcard expansion applies to only to the immediate command it is associated with, and the "outer" command runs only once.

Wildcards are expanded on the command line before any command runs. For example if you have files foo and bar in your directory and run grep pattern * | tail -n1 grep pattern * | tail -n1 then bash transforms this into grep pattern foo bar | tail -n1 grep pattern foo bar | tail -n1 and runs that. Since there's only one stream of output from grep, there's only one stream of input to tail and it prints the last line of that stream.

If you want to search each file and print the last line of grep's output separately you can use a loop:

for file in * ; do
  grep pattern "${file}" | tail -n1
done

The problem with non-loop solutions is that tail doesn't inherently know where the output of one file ends and the output of another file begins, or indeed that there are even files involved on the other end of the pipe. It just knows input is coming in from somewhere and it has to print the last line of that input. If you didn't want a loop, you'd have to use a more powerful tool like awk and perhaps use the fact that grep prepends the names of matched files (if multiple files are matched, or with -H ) to delimit the start and end of outputs from each file. But, the work to write an awk program that keeps track of the current file to know when its output ends and print its last line is probably more effort than is worth when the loop solution is so simple.

You can achieve what you want using xargs . For your example it would be:

ls * | xargs -n 1 sh -c 'grep "pattern" $0 | tail -n 1'

Can save you from having to write a loop.

You can do this with awk , although (as tjm3772 pointed out in their answer) it's actually more complicated than the shell for loop. For the record, here's what I came up with:

awk -v pattern="YourPatternHere" '(FNR==1 && line!="") {print line; line=""}; $0~pattern {line=$0}; END {if (line!="") print line}'

Explanation: when it finds a matching line ( $0~pattern ), it stores that line in the line variable ( {line=$0} ) (this means that at the end of the file, line will hold the last matching line.

(Note: if you want to just include a literal pattern in the program, remove the -v pattern="YourPatternHere" part and replace $0~pattern with just /YourPatternHere/ )

There's no simple trigger to print a match at the end of each file, so that part's split into two pieces: if it's the first line of a file AND line is set because of a match in the previous file ( (FNR==1 && line!="") ), print line and then clear it so it's not mistaken for a match in the current file ( {print line; line=""} ). Finally, at the end of the final file ( END ), print a match found in that last file if there was one ( {if (line!="") print line} ).

Also, note that the print-at-beginning-of-new-file test must be before the check for a matching line, or else it'll get very confused if the first line of the new file matches.

So... yeah, a shell for loop is simpler (and much easier to get right).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM