简体   繁体   English

使用AWK对不同文件中的列求和

[英]Using AWK to sum column from different files

I have bunch of data which are named as let's say a0001.xyz to a0254.xyz. 我有一堆名为a0001.xyz到a0254.xyz的数据。 I want to sum 5th column of each file and write the answer to a file called output.txt. 我想对每个文件的第5列求和,并将答案写到名为output.txt的文件中。 So i am looking for a single column file containing the sums of each .xyz files. 所以我正在寻找一个包含每个.xyz文件总和的单列文件。

I've tried something like this: 我已经尝试过这样的事情:

awk -f sum.awk a0004.xyz > output.txt

where sum.awk is sum.awk在哪里

#sum.awk
{ sum+=$5}
END { print sum }

it gives me the sum of 5th column of a0004.xyz file and writes it to output.txt. 它给了我a0004.xyz文件第5列的总和,并将其写入output.txt。 The problem is when i change the command to: 问题是当我将命令更改为:

awk -f sum.awk *.xyz > output.txt

again it gives me only one of the sums among all .xyz files. 同样,它只给我所有.xyz文件中的总和之一。 How can i fix this? 我怎样才能解决这个问题?

I hope i've managed to ask it clear 我希望我已经设法把它弄清楚了

Something like this? 像这样吗

$ tail a*.xyz
==> a0001.xyz <==
1 2 3 4 5 6 7
2 3 4 5 6 7 8

==> a0254.xyz <==
3 4 5 6 7 8 9
4 5 6 7 8 9 10
$ awk '{a[FILENAME]+=$5} END {for (i in a) printf "%4d %s\n", a[i], i}' a*.xyz
  11 a0001.xyz
  15 a0254.xyz

The awk script here adds the value of $5 to an array element named for the current filename. 这里的awk脚本将$5的值添加到以当前文件名命名的数组元素中。 After processing all input, it steps through the array and prints the results, with keys being the filename that contributed to each value. 处理完所有输入后,它将逐步遍历数组并打印结果,键是贡献给每个值的文件名。 Awk processes the list of filenames cleanly and portably, without the need for pipes. Awk整洁且可移植地处理文件名列表,而无需管道。

Do them all in parallel with GNU Parallel : GNU Parallel并行进行

parallel -k -q awk '{s+=$5} END{print FILENAME,s+0}' ::: a*xyz

Sample Output 样本输出

a0001.xyz 20
a0002.xyz 40
a0254.xyz 55

Notes: 笔记:

  • -k means "keep the output in order" -k表示“按顺序保留输出”
  • -q means "quote my awk stuff please, because I am lazy" -q意思是“请引用我的awk内容,因为我很懒”
  • s+0 means to treat s as a number so if it is not set, it prints 0 s+0表示将s视为数字,因此如果未设置,则输出0

Or with gawk : 或使用gawk

gawk '{s+=$5} ENDFILE{print FILENAME,s+0; s=0}' a*xyz

Sample Output 样本输出

a0001.xyz 20 
a0002.xyz 40
a0254.xyz 55

You can use bash for: 您可以将bash用于:

for file in *.mp3; do 
      awk -f sum.awk "$file"
done > output.txt

One option would be to toss the results of each file into array (indexed by the filename) and print at the end: 一种选择是将每个文件的结果放入数组(由文件名索引)并在末尾打印:

awk '{a[FILENAME]+=$5} END{for(f in a) print f, a[f]}' *.xyz

Optionally you could capture filename to a variable and print whenever the FNR==1 as well as END: (可选)您可以将文件名捕获到变量中,并在FNR == 1以及END时打印:

awk 'FNR==1 && filename{print filename, sum; sum=0} {sum+=$1;filename=FILENAME} END{print filename, sum}' *.xyz
echo *.xyz | xargs -n 1 awk '{sum+=$5} END{print FILENAME,sum }' > output.txt

Output to output.txt (eg): 输出到output.txt(例如):

a0001.xyz 7
a0254.xyz 12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM