简体   繁体   中英

how to count number of lines of a specific entry under a specific pattern using awk?

I have a text file with a pattern that looks like the following

Sample1
Feature 1
A
B
C
Feature 2
A
G
H
L
Sample2
Feature 1
A
M
W
Feature 2
P
L

I'm trying to count how many entries are for each feature in each sample. So my desired output should look something like this:

Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

I tried using the following awk command:

$ awk '{if(/^\Feature/){n=$0;}else{l[n]++}}
       END{for(n in l){print n" : "l[n]}}' inputfile.txt > result.txt

But it gave me the following output

Feature 1: 6
Feature 2: 6

So I was wondering if someone can help me in modifying this command to get the desired output or suggest for me another command? (PS the original file contains hundreds of samples and around 94 features)

You could use this awk :

awk '/^Sample/{printf "%s%s",(c?c"\n":""),$0;c=0;next}
     /^Feature/{printf "%s\n%s: ",(c?c:""),$0;c=0;next}
     {c++}
     END{print c}' file

The script increment the counter c only for lines that doesn't start with Sample or Feature .

If one of the 2 keywords are found, the counter is printed.

Following awk may help you here.

awk '
/^Sample/ && count1 && count2{
   print "Feature 1:",count1 ORS "Feature 2:",count2;
   count1=count2=flag1=flag2=""}
/^Sample/{
   print;
   flag=1;
   next}
flag && /^Feature/{
   if($NF==1){ flag1=1 };
   if($NF==2){ flag2=1;
               flag1=""};
   next}
flag && flag1{ count1++ }
flag && flag2{ count2++ }
END{
   if(count1 && count2){
      print "Feature 1:",count1 ORS "Feature 2:",count2}
}'  Input_file

Output will be as follows.

Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2

This awk may also work:

awk '/^Sample/ {
   for (i in a)
      print i ": " a[i]
   print
   delete a
   next
}
/^Feature/ {
   f = $0
   next
}
{
   ++a[f]
}
END {
   for (i in a) 
      print i ": " a[i]
}' file

Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
$ cat tst.awk
BEGIN { OFS = ": " }
/Sample/  { prtFeat(); print (NR>1 ? ORS : "") $0; next }
/Feature/ { prtFeat(); name=$0; next }
{ ++cnt }
END { prtFeat() }
function prtFeat() {
    if (cnt) {
        print name, cnt
        cnt = 0
    }
}

$ awk -f tst.awk file
Sample1
Feature 1: 3
Feature 2: 4

Sample2
Feature 1: 3
Feature 2: 2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM