简体   繁体   中英

Using AWK to sum column from different files

I have bunch of data which are named as let's say a0001.xyz to a0254.xyz. I want to sum 5th column of each file and write the answer to a file called output.txt. So i am looking for a single column file containing the sums of each .xyz files.

I've tried something like this:

awk -f sum.awk a0004.xyz > output.txt

where sum.awk is

#sum.awk
{ sum+=$5}
END { print sum }

it gives me the sum of 5th column of a0004.xyz file and writes it to output.txt. The problem is when i change the command to:

awk -f sum.awk *.xyz > output.txt

again it gives me only one of the sums among all .xyz files. How can i fix this?

I hope i've managed to ask it clear

Something like this?

$ tail a*.xyz
==> a0001.xyz <==
1 2 3 4 5 6 7
2 3 4 5 6 7 8

==> a0254.xyz <==
3 4 5 6 7 8 9
4 5 6 7 8 9 10
$ awk '{a[FILENAME]+=$5} END {for (i in a) printf "%4d %s\n", a[i], i}' a*.xyz
  11 a0001.xyz
  15 a0254.xyz

The awk script here adds the value of $5 to an array element named for the current filename. After processing all input, it steps through the array and prints the results, with keys being the filename that contributed to each value. Awk processes the list of filenames cleanly and portably, without the need for pipes.

Do them all in parallel with GNU Parallel :

parallel -k -q awk '{s+=$5} END{print FILENAME,s+0}' ::: a*xyz

Sample Output

a0001.xyz 20
a0002.xyz 40
a0254.xyz 55

Notes:

  • -k means "keep the output in order"
  • -q means "quote my awk stuff please, because I am lazy"
  • s+0 means to treat s as a number so if it is not set, it prints 0

Or with gawk :

gawk '{s+=$5} ENDFILE{print FILENAME,s+0; s=0}' a*xyz

Sample Output

a0001.xyz 20 
a0002.xyz 40
a0254.xyz 55

You can use bash for:

for file in *.mp3; do 
      awk -f sum.awk "$file"
done > output.txt

One option would be to toss the results of each file into array (indexed by the filename) and print at the end:

awk '{a[FILENAME]+=$5} END{for(f in a) print f, a[f]}' *.xyz

Optionally you could capture filename to a variable and print whenever the FNR==1 as well as END:

awk 'FNR==1 && filename{print filename, sum; sum=0} {sum+=$1;filename=FILENAME} END{print filename, sum}' *.xyz
echo *.xyz | xargs -n 1 awk '{sum+=$5} END{print FILENAME,sum }' > output.txt

Output to output.txt (eg):

a0001.xyz 7
a0254.xyz 12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM