How do I use grep to count the number of occurrences of a string?
input:
.
├── a.txt
├── b.txt
// a.txt
aaa
// b.txt
aaa
bbb
ccc
Now I want to know how many times aaa
and bbb
appear.
output:
aaa: 2
bbb: 1
You can try awk
. This uses split
to count the occurrences of the search patterns and puts them in the "associative" array n
.
$ awk 'BEGIN{ pat1="aaa"; pat2="bbb" }
{ n[pat1]+=(split($0,arr,pat1)-1) }
{ n[pat2]+=(split($0,arr,pat2)-1) }
END{ for(i in n){ print i":",n[i] } }' a.txt b.txt
aaa: 10
bbb: 14
$ cat a.txt
aaa
aaa efwepom dq
bbb qwpdo bbb
qwdo qwdpomaaa
qwo bbb
pefaaaomaaaewe bb aa
aaa bbb
$ cat b.txt
aaa
aaa efwepom dq
bbb qwpdo bbb
qwdo qwdpomaaa
qwo bbb
pebbb bbb fobbbmebbbwe bb aa
aaa bbb
bbbbbbsad
Just an idea:
grep -E "aaa|bbb|ccc" *.txt | awk -F: '{print $2}' | sort | uniq -c
This means:
grep -E "...|..." : extended grep, look for all entries
The result is given as:
a.txt:aaa
b.txt:aaa
b.txt:bbb
b.txt:ccc
awk -F: '{print $2}' : split the result in 2 columns,
based on the semicolon,
and only show the second column
sort | uniq -c : sort and count unique entries
The problem with grep
is if you have more than one item on a single line.
grep
counts lines , so you need -o
and another instance of grep
or a wc
or some such.
$: cat lst
aaa
bbb
$: cat a.txt
aaa
$: cat b.txt # I added a second hit on the bbb line
aaa
bbb bbb
ccc
$: files=( [ab].txt )
$: time while read pattern; do
printf "%s: " "$pattern";
grep -o "$pattern" "${files[@]}" | wc -l;
done < lst
aaa: 2
bbb: 2
Note that this is slow, even with such a small dataset.
real 0m1.119s
user 0m0.060s
sys 0m0.308s
This lets you make a list file, but reads every file in your target set once per pattern, and executes the grep
AND the wc
on each. Andre 's awk
solution would be cleaner, faster, and generally better all around, especially if you put the list in a file and parsed against that rather than as a set of literals.
$: time awk 'NR==FNR{ pats[$0]; next; }
{ for (p in pats) { n[p]+=(split($0,arr,p)-1) } }
END{ for(p in n){ print p": ",n[p] } }' lst "${files[@]}"
aaa: 2
bbb: 2
Considerably faster - likely MUCH more so on more data and files.
real 0m0.344s
user 0m0.015s
sys 0m0.092s
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.