简体   繁体   English

如何使用awk计算特定模式下特定条目的行数?

[英]how to count number of lines of a specific entry under a specific pattern using awk?

I have a text file with a pattern that looks like the following 我有一个文本文件,其样式如下所示

Sample1
Feature 1
A
B
C
Feature 2
A
G
H
L
Sample2
Feature 1
A
M
W
Feature 2
P
L

I'm trying to count how many entries are for each feature in each sample. 我正在尝试计算每个示例中每个功能的条目数。 So my desired output should look something like this: 所以我想要的输出应如下所示:

Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

I tried using the following awk command: 我尝试使用以下awk命令:

$ awk '{if(/^\Feature/){n=$0;}else{l[n]++}}
       END{for(n in l){print n" : "l[n]}}' inputfile.txt > result.txt

But it gave me the following output 但这给了我以下输出

Feature 1: 6
Feature 2: 6

So I was wondering if someone can help me in modifying this command to get the desired output or suggest for me another command? 所以我想知道是否有人可以帮助我修改此命令以获得所需的输出或为我建议另一个命令? (PS the original file contains hundreds of samples and around 94 features) (PS原始文件包含数百个样本和大约94个功能)

You could use this awk : 您可以使用以下awk

awk '/^Sample/{printf "%s%s",(c?c"\n":""),$0;c=0;next}
     /^Feature/{printf "%s\n%s: ",(c?c:""),$0;c=0;next}
     {c++}
     END{print c}' file

The script increment the counter c only for lines that doesn't start with Sample or Feature . 脚本仅对不以SampleFeature开头的行增加计数器c

If one of the 2 keywords are found, the counter is printed. 如果找到2个关键字之一,则会打印计数器。

Following awk may help you here. 跟随awk可能会帮助您。

awk '
/^Sample/ && count1 && count2{
   print "Feature 1:",count1 ORS "Feature 2:",count2;
   count1=count2=flag1=flag2=""}
/^Sample/{
   print;
   flag=1;
   next}
flag && /^Feature/{
   if($NF==1){ flag1=1 };
   if($NF==2){ flag2=1;
               flag1=""};
   next}
flag && flag1{ count1++ }
flag && flag2{ count2++ }
END{
   if(count1 && count2){
      print "Feature 1:",count1 ORS "Feature 2:",count2}
}'  Input_file

Output will be as follows. 输出如下。

Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2

This awk may also work: 这个awk也可以工作:

awk '/^Sample/ {
   for (i in a)
      print i ": " a[i]
   print
   delete a
   next
}
/^Feature/ {
   f = $0
   next
}
{
   ++a[f]
}
END {
   for (i in a) 
      print i ": " a[i]
}' file

Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
$ cat tst.awk
BEGIN { OFS = ": " }
/Sample/  { prtFeat(); print (NR>1 ? ORS : "") $0; next }
/Feature/ { prtFeat(); name=$0; next }
{ ++cnt }
END { prtFeat() }
function prtFeat() {
    if (cnt) {
        print name, cnt
        cnt = 0
    }
}

$ awk -f tst.awk file
Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM