简体   繁体   English

使用 AWK 计算文件中特定字母的比例

[英]Calculate proportion of specific letters in a file using AWK

First of all, thank you for helping with my problem.首先,感谢您帮助解决我的问题。 I have a csv file where I have a column that have different letter encoded, as you can see here.我有一个 csv 文件,其中我有一个具有不同字母编码的列,如您在此处看到的。

ABC
CD
EF
F
D
F
AS

I know how to calculate the proportion of F in the column and display the output as follows:我知道如何计算F在列中的比例并显示output如下:

awk 'BEGIN{sum=0;count=0}{count++}{if($1="F")sum+=$1}END {print sum/count}'
0.286

But my problem is that I want to do it generally with every single line of the column that have one character so the output will be:但我的问题是,我想通常对具有一个字符的列的每一行执行此操作,因此 output 将是:

F 0.286
D 0.143

Thank you everyone for your help.感谢大家的帮助。

Instead of using if ($1=="F") to filter out a specific character, store all single letters in an array and iterate over them at the end.与其使用if ($1=="F")过滤掉特定字符,不如将所有单个字母存储在一个数组中并在最后遍历它们。

awk 'length($0)==1{a[$0]++} END{for(c in a) print c, a[c]/NR}'

Arrays in awk have no fixed iteration order, so the output might be either awk 中的awk没有固定的迭代顺序,因此 output 可能是

F 0.286
D 0.143

or或者

D 0.143
F 0.286

If you want a fixed order, pipe the result through sort .如果你想要一个固定的顺序, pipe 通过sort得到结果。

With your shown samples, could you please try following.使用您显示的示例,您能否尝试以下操作。

awk '
/^.$/{
  count[$0]++
}
END{
  for(key in count){
    print key,count[key]/FNR
  }
}
' Input_file

Explanation: Adding detailed explanation for above.说明:为上述添加详细说明。

awk '                    ##Starting awk program from here.
/^.$/{                   ##Checking condition if line is having 1 character long.
  count[$0]++            ##Creating count with index of current line and keep increasing count of it with 1 here.
}
END{                     ##Starting END block of this program from here.
  for(key in count){     ##Traversing through count array here.
    print key,count[key]  ##printing key and printing divide of value of count with FNR.
  }
}
' Input_file             ##Mentioning Input_file name here.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM