[英]how to count number of lines of a specific entry under a specific pattern using awk?
I have a text file with a pattern that looks like the following 我有一个文本文件,其样式如下所示
Sample1
Feature 1
A
B
C
Feature 2
A
G
H
L
Sample2
Feature 1
A
M
W
Feature 2
P
L
I'm trying to count how many entries are for each feature in each sample. 我正在尝试计算每个示例中每个功能的条目数。 So my desired output should look something like this:
所以我想要的输出应如下所示:
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
I tried using the following awk command: 我尝试使用以下awk命令:
$ awk '{if(/^\Feature/){n=$0;}else{l[n]++}}
END{for(n in l){print n" : "l[n]}}' inputfile.txt > result.txt
But it gave me the following output 但这给了我以下输出
Feature 1: 6
Feature 2: 6
So I was wondering if someone can help me in modifying this command to get the desired output or suggest for me another command? 所以我想知道是否有人可以帮助我修改此命令以获得所需的输出或为我建议另一个命令? (PS the original file contains hundreds of samples and around 94 features)
(PS原始文件包含数百个样本和大约94个功能)
You could use this awk
: 您可以使用以下
awk
:
awk '/^Sample/{printf "%s%s",(c?c"\n":""),$0;c=0;next}
/^Feature/{printf "%s\n%s: ",(c?c:""),$0;c=0;next}
{c++}
END{print c}' file
The script increment the counter c
only for lines that doesn't start with Sample
or Feature
. 脚本仅对不以
Sample
或Feature
开头的行增加计数器c
。
If one of the 2 keywords are found, the counter is printed. 如果找到2个关键字之一,则会打印计数器。
Following awk
may help you here. 跟随
awk
可能会帮助您。
awk '
/^Sample/ && count1 && count2{
print "Feature 1:",count1 ORS "Feature 2:",count2;
count1=count2=flag1=flag2=""}
/^Sample/{
print;
flag=1;
next}
flag && /^Feature/{
if($NF==1){ flag1=1 };
if($NF==2){ flag2=1;
flag1=""};
next}
flag && flag1{ count1++ }
flag && flag2{ count2++ }
END{
if(count1 && count2){
print "Feature 1:",count1 ORS "Feature 2:",count2}
}' Input_file
Output will be as follows. 输出如下。
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
This awk
may also work: 这个
awk
也可以工作:
awk '/^Sample/ {
for (i in a)
print i ": " a[i]
print
delete a
next
}
/^Feature/ {
f = $0
next
}
{
++a[f]
}
END {
for (i in a)
print i ": " a[i]
}' file
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
$ cat tst.awk
BEGIN { OFS = ": " }
/Sample/ { prtFeat(); print (NR>1 ? ORS : "") $0; next }
/Feature/ { prtFeat(); name=$0; next }
{ ++cnt }
END { prtFeat() }
function prtFeat() {
if (cnt) {
print name, cnt
cnt = 0
}
}
$ awk -f tst.awk file
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.