简体   繁体   中英

How to use STDIN twice from pipe

I have a awk script something like

awk 'FNR==NR {col1[$1]++; col2[$2]++; next} {print $0, col2[$2] "/" length(col1)}' input input

But in case I have lot of files and need to use this script for concatenated files together like:

cat *all_input | awk 'FNR==NR {col1[$1]++; col2[$2]++; next} {print $0, col2[$2] "/" length(col1)}' STDIN STDIN

Does not work. How to use STDIN twice from pipe?

You don't need to use a pipe. If you are using bash use process-substitution as <(cmd) ie to achieve a redirection where the input or output of a process (some sequence of commands) appear as a temporary file.

awk 'FNR==NR {col1[$1]++; col2[$2]++; next} {print $0, col2[$2] "/" length(col1)}' <(cut -f3 5- input) <(cut -f3 5- input)

The answer to How to use STDIN twice from pipe is "you can't". If you want to use the data from stdin twice then you need to save it somewhere when you read it the first time so you have it next time. For example:

$ seq 3 |
awk '
    BEGIN {
        if ( ("mktemp"|getline line) > 0) tmp=line; else exit
        ARGV[ARGC]=tmp; ARGC++
    }
    NR==FNR { print > tmp }
    { print FILENAME, NR, FNR, $0 }
' -
- 1 1 1
- 2 2 2
- 3 3 3
/var/folders/11/vlqr7jmn6jj3fglyl12lj0l00000gn/T/tmp.Y03l9pS7 4 1 1
/var/folders/11/vlqr7jmn6jj3fglyl12lj0l00000gn/T/tmp.Y03l9pS7 5 2 2
/var/folders/11/vlqr7jmn6jj3fglyl12lj0l00000gn/T/tmp.Y03l9pS7 6 3 3

or you can store it in an internal array or string and read it back from there later.

Having said that, your specific problem doesn't need anything that fancy, just a simple:

cat *all_input | awk 'FNR==NR {col1[$1]; col2[$2]++; next} {print $0, col2[$2] "/" length(col1)}' - *all_input

would do it but unless your files are huge all you really need is the store-it-in-array approach:

awk '{ col1[$1]; col2[$2]++; f0[NR]=$0; f2[NR]=$2 }
END {
    for (nr=1; nr<=NR; nr++) {
        print f0[nr], col2[f2[nr]] "/" length(col1)
    }
}' *all_input

I don't know if that can help because I am not an awk expert, but any Linux application (including awk) can read stdin straight from /proc/self/fd/0

Note that this is way less portable than open(0) and will only work on Linux with readable procfs (nearly all Linux distributions today).

if the application allows for parallel file descriptors consumption, you can open that file descriptor twice and read from it twice.

self in the path designates the PID of the accessing application.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM