[英]Count pattern occurrence in each line of file?
My file looks like this: 我的文件看起来像这样:
id12 ack dko hhhh chfl dkl dll chfl
id14 slo ksol chfl dloo
id13 mse
id23 clos chfl dll alo
grep -c 'chfl' filename
, gives me the number of occurrence of chfl
, but I want to count occurrence of chfl
per line. grep -c 'chfl' filename
,给出了chfl
的出现chfl
,但我想计算每行chfl
出现次数。 Like this: 像这样:
id12 2
id14 1
id13 0
id23 1
Also how do I do the same with two patterns to match? 另外我如何使用两种模式来匹配? Like
chfl
and dll
? 像
chfl
和dll
?
perl -lane 'undef $c;
for(@F){$c++ if(/^chfl$/)};
print "$F[0] ",$c?$c:"0"' your_file
Or simply: 或者干脆:
perl -lane '$c=0;
for(@F){$c++ if(/^chfl$/)};
print "$F[0] $c"' your_file
Tested below: 测试如下:
> cat temp
id12 ack dko hhhh chfl dkl dll chfl
id14 slo ksol chfl dloo
id13 mse
id23 clos chfl dll alo
> perl -lane '$c=0;for(@F){$c++ if(/^chfl$/)};print "$F[0] $c"' temp
id12 2
id14 1
id13 0
id23 1
>
Also in awk:(Logic here remains the same as above one in perl) 同样在awk中:(这里的逻辑与perl中的上面一样)
awk '{a=0;
for(i=1;i<=NF;i++)if($i~/chfl/)a++;
print $1,a}' your_file
A Perl version that copes with multiple strings. 一个处理多个字符串的Perl版本。
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
die "Usage: $0 pattern [pattern ...] file\n" unless @ARGV > 1;
my @patterns;
until (@ARGV == 1) {
push @patterns, shift;
}
my $re = '(' . join('|', map { "\Q$_\E" } @patterns) . ')';
my %match;
while (<>) {
if (my @matches = /$re/g) {
$match{$_}++ for @matches;
}
}
say "$_: $match{$_}" for sort keys %match;
A couple of test runs: 几个测试运行:
$ ./cgrep chfl dll cgrep.txt
chfl: 4
$ ./cgrep chfl dll cgrep.txt
chfl: 4
dll: 2
How about: 怎么样:
my %res;
while(<DATA>) {
chomp;
my ($id,$rest) = $_ =~ /^(\S+)(.*)$/;
$res{chfl}{$id} =()= $rest =~ /(chfl)/g;
$res{dll}{$id} =()= $rest =~ /(dll)/g;
}
say Dumper\%res;
__DATA__
id12 ack dko hhhh chfl dkl dll chfl
id14 slo ksol chfl dloo
id13 mse
id23 clos chfl dll alo
output: 输出:
$VAR1 = {
'dll' => {
'id13' => 0,
'id12' => 1,
'id23' => 1,
'id14' => 0
},
'chfl' => {
'id13' => 0,
'id12' => 2,
'id23' => 1,
'id14' => 1
}
};
Use this: 用这个:
awk 'BEGIN {print "id\tchfl\tdll\n--------------------"}{c=d=i=0;while(i++<NF){if($i=="chfl")c++; if($i=="dll")d++}; print $1,c,d}' OFS="\t" file
id chfl dll
--------------------
id12 2 1
id14 1 0
id13 0 0
id23 1 1
bash one liner with grep: 用grep打一个班轮:
while read line ; do echo $line | grep -o 'chfl' | wc -l ; done < your_file
-o outputs every occurence on a new line and wc counts them. -o输出新行上的每个出现,wc对它们进行计数。
Edit for multiple patterns: 编辑多个模式:
patterns=(chfl dll)
while read line ; do
for pattern in ${patterns[@]} ; do
echo -ne $pattern"\t" ; echo $line | grep -o $pattern | wc -l
done
done < your_file
Another version of awk
: 另一个版本的
awk
:
$ awk '{c1=gsub(var1,x);c2=gsub(var2,x);print $1,var1"="c1,var2"="c2}' var1="chfl" var2="dll" file
id12 chfl=2 dll=1
id14 chfl=1 dll=0
id13 chfl=0 dll=0
id23 chfl=1 dll=1
Just pass the variables you want to count at the end of the file. 只需在文件末尾传递要计数的变量即可。
你可以用这个awk
,
awk '{d=c=0;for(i=1;i<=NF;i++){ if($i ~ /chfl/)c++; if($i ~ /dll/)d++;} print $1,c,d}' yourfile
perl -ne 'my $c=s/chfl//g||0;my $d=s/dll//g||0;s/ .*//s;print "$_ chfl $c dll $d\n"' file
Explanation: 说明:
s///g
in scalar context returns the number of substitutions made s///g
在标量上下文中返回所做的替换次数 ||0
make sure the variable is set to zero if there are no matches ||0
如果没有匹配项,请确保将变量设置为零 s/ .*//s
throws away everything from the 1st space from $_
, leaving the id only s/ .*//s
抛弃$_
的第一个空格中的所有内容,仅保留id It will produce the following output: 它将产生以下输出:
id12 chfl 2 dll 1
id14 chfl 1 dll 0
id13 chfl 0 dll 0
id23 chfl 1 dll 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.