[英]How to count the number of data with specific conditions from columns using awk?
[英]how to count number of lines of a specific entry under a specific pattern using awk?
我有一个文本文件,其样式如下所示
Sample1
Feature 1
A
B
C
Feature 2
A
G
H
L
Sample2
Feature 1
A
M
W
Feature 2
P
L
我正在尝试计算每个示例中每个功能的条目数。 所以我想要的输出应如下所示:
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
我尝试使用以下awk命令:
$ awk '{if(/^\Feature/){n=$0;}else{l[n]++}}
END{for(n in l){print n" : "l[n]}}' inputfile.txt > result.txt
但这给了我以下输出
Feature 1: 6
Feature 2: 6
所以我想知道是否有人可以帮助我修改此命令以获得所需的输出或为我建议另一个命令? (PS原始文件包含数百个样本和大约94个功能)
您可以使用以下awk
:
awk '/^Sample/{printf "%s%s",(c?c"\n":""),$0;c=0;next}
/^Feature/{printf "%s\n%s: ",(c?c:""),$0;c=0;next}
{c++}
END{print c}' file
脚本仅对不以Sample
或Feature
开头的行增加计数器c
。
如果找到2个关键字之一,则会打印计数器。
跟随awk
可能会帮助您。
awk '
/^Sample/ && count1 && count2{
print "Feature 1:",count1 ORS "Feature 2:",count2;
count1=count2=flag1=flag2=""}
/^Sample/{
print;
flag=1;
next}
flag && /^Feature/{
if($NF==1){ flag1=1 };
if($NF==2){ flag2=1;
flag1=""};
next}
flag && flag1{ count1++ }
flag && flag2{ count2++ }
END{
if(count1 && count2){
print "Feature 1:",count1 ORS "Feature 2:",count2}
}' Input_file
输出如下。
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
这个awk
也可以工作:
awk '/^Sample/ {
for (i in a)
print i ": " a[i]
print
delete a
next
}
/^Feature/ {
f = $0
next
}
{
++a[f]
}
END {
for (i in a)
print i ": " a[i]
}' file
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
$ cat tst.awk
BEGIN { OFS = ": " }
/Sample/ { prtFeat(); print (NR>1 ? ORS : "") $0; next }
/Feature/ { prtFeat(); name=$0; next }
{ ++cnt }
END { prtFeat() }
function prtFeat() {
if (cnt) {
print name, cnt
cnt = 0
}
}
$ awk -f tst.awk file
Sample1
Feature 1: 3
Feature 2: 4
Sample2
Feature 1: 3
Feature 2: 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.