如何聚合bash one-liner中的计数

Question

I often use sort | uniq -c 我经常使用sort | uniq -c sort | uniq -c to make count statistics. sort | uniq -c进行统计统计。 Now, if I have two files with such count statistics, I would like to put them together and add the counts. 现在，如果我有两个带有这种计数统计数据的文件，我想将它们放在一起并添加计数。 (I know I could append the original files and count there, but lets assume only the count files are accessible). （我知道我可以附加原始文件并计算在那里，但我们假设只有计数文件可以访问）。

For example given: 例如给出：

a.cnt: a.cnt：

   1 a
   2 c

b.cnt: b.cnt：

   2 b
   1 c

I would like to concatenate and get the following output: 我想连接并获得以下输出：

   1 a
   2 b
   3 c

What's the shortest way to do this in the shell? 在shell中执行此操作的最短方法是什么？

Edit: 编辑：

Thanks for the answers so far! 谢谢你到目前为止的答案！

Some possible side-aspects one might want to consider additionally: 可能还需要考虑的一些可能的方面：

what if a, b, c are arbritrary strings, containing arbitrary white-spaces? 如果a，b，c是包含任意空格的arbritrary字符串怎么办？
what if the files are too big to fit in memory? 如果文件太大而不适合内存怎么办？ Is there some sort | uniq -c 有sort | uniq -c sort | uniq -c -style command line option for this case that only looks at two lines at a time? 这种情况的sort | uniq -c style命令行选项一次只查看两行？

Answer 1

This can work for any given number of files: 这适用于任何给定数量的文件：

$ cat a.cnt b.cnt | awk '{a[$2]+=$1} END{for (i in a) print a[i],i}'
1 a
2 b
3 c

So if you have let's say 10 files, you just have to do cat f1 f2 ... and then pipe this awk . 所以，如果你让我们说10个文件，你只需要做cat f1 f2 ...然后管道这个awk 。

If the file names happen to share a pattern, you can also do ( thanks Adrian Frühwirth! ): 如果文件名碰巧共享一个模式，你也可以这样做（感谢AdrianFrühwirth！）：

awk '{a[$2]+=$1} END{for (i in a) print a[i],i}' *cnt

So for example this will take into consideration all the files whose extension is cnt . 因此，例如，这将考虑其扩展名为cnt所有文件。

Update 更新

Some possible side-aspects one might want to consider additionally: 可能还需要考虑的一些可能的方面：

what if a, b, c are arbritrary strings, containing arbitrary white-spaces? 如果a，b，c是包含任意空格的arbritrary字符串怎么办？

what if the files are too big to fit in memory? 如果文件太大而不适合内存怎么办？ Is there some sort | uniq -c 有sort | uniq -c sort | uniq -c -style command line option for this case that only looks at two lines at a time? 这种情况的sort | uniq -c style命令行选项一次只查看两行？

In that case, you can use the rest of the columns as indexes for the counter: 在这种情况下，您可以使用其余列作为计数器的索引：

cat *cnt | awk '{count=$1; $1=""; a[$0]=count} END{for (i in a) print a[i],i}'

Note that in fact you don't need to sort | uniq -c 请注意，实际上您不需要sort | uniq -c sort | uniq -c and redirect to a cnt file and then perform this re-counting. sort | uniq -c并重定向到cnt文件，然后执行此重新计数。 You can do it all together with something like this: 您可以使用以下内容完成所有操作：

awk '{a[$0]++} END{for (i in a) print a[i], i}' file

Example 例

$ cat a.cnt
   1 and some
   2 text here

$ cat b.cnt
   4 and some
   4 and other things
   2 text here
   9 blabla

$ cat *cnt | awk '{count=$1; $1=""; a[$0]=count} END{for (i in a) print a[i],i}'
2  text here
9  blabla
4  and some
4  and other things

Regarding second comment: 关于第二评论：

$ cat b
and some
text here
and some
and other things
text here
blabla

$ awk '{a[$0]++} END{for (i in a) print a[i], i}' b
2 and some
2 text here
1 and other things
1 blabla

Answer 2

Using awk: 使用awk：

awk 'FNR==NR{a[$2]=$1;next} $2 in a{a[$2]+=$1}1' a.cnt b.cnt
1 a
2 b
3 c

Answer 3

$ awk '{a[$2]+=$1}END{for(i in a){print a[i], i}}' a.cnt b.cnt
1 a
2 b
3 c

如何聚合bash one-liner中的计数

问题描述

3 个解决方案

解决方案1
8 已采纳 2014-03-13 15:57:10

Update 更新

Example 例

解决方案2
5 2014-03-13 15:56:23

解决方案3
5 2014-03-13 15:57:01

如何聚合bash one-liner中的计数

问题描述

3 个解决方案

解决方案1 8 已采纳 2014-03-13 15:57:10

Update 更新

Example 例

解决方案2 5 2014-03-13 15:56:23

解决方案3 5 2014-03-13 15:57:01

解决方案1
8 已采纳 2014-03-13 15:57:10

解决方案2
5 2014-03-13 15:56:23

解决方案3
5 2014-03-13 15:57:01