简体   繁体   中英

count exact word using grep or awk

I have a scenario where I want to count word which start with special character in entire file

My word is: $name

This exact $name how many times appeared in file I need a count.

When I use this below command does not give count.

grep "$name" /patha/demo.txt | wc -w 

grep "$name" /path/demo.txt | wc -l

My data in demo.txt file

Abc $name  -> 1
name city  
villagename
abczyz$name  -> 1
raj 
nameee
Rahul$nameeee
123name1
$namename

The count i am expecting is: 2 [exact match]

Double quotes don't protect the string from string interpolation by the shell. If name is not a defined variable, you are actually running grep "" demo.txt after the shell replaces $name with the variable's (nonexistent) value.

The $ character is a regex metacharacter which needs to be escaped from the regex engine, too, or you can use the -F flag to turn off regex matching and only select literal matches.

It's not clear what you mean by "word"; the requirement that $nameeee should not count as a match suggests the use of the -w option; bot its exact semantics of what is a "word" may differ from yours.

grep -c (typically) reports the number of matching lines; if a line which contains the pattern twice or more should count as multiple matches, you need a different approach.

grep -woF '$name' demo.txt | wc -l

prints every match on a separate line ( -o ) and only searches for literal matches ( -F ) in isolated words ( -w ); the pattern is within single quotes, so that it is passed on verbatim to grep ; and we count the number of generated output lines with a pipe to wc -l .

Alternatively, you could specify a regex with an exact boundary condition. The following assumes counting the number of matching lines is sufficient, and focuses on demonstrating how to write a regex which matches $name only if it is not immediately followed by an alphabetic character or a dollar sign.

grep -E '\$name($|[^a-zA-Z$])' demo.txt

The -E option selects extended regular expression syntax which enables some features which were not supported in the traditional original grep . (By POSIX, you could equivalently backslash | and the parentheses to enable their use as alternation and grouping characters with plain grep ; but I find this convention to be weird and the resulting regex will be harder to read). The first backslash changes $ from a regex metacharacter which matches end of line, to an expression which simply matches a literal dollar sign. The parentheses allow either end of line ( $ now with its metacharacter meaning) or a character which is not a lowercase or uppercase character after name .

The same is moderately easy in Awk, too. Split the line on the search regex and count the number of resulting fields, minus one (if there is no separator, there will be a single field, if it occurs once, the line will be split in two fields, etc).

awk '{ n = split($0, a, /\$name($|[^a-zA-Z$])/); total += n-1 }
    END { print 0+total }' demo.txt

(With GNU Awk, you could set the built-in field separator to the regex. Anyway, I went for a solution which should be portable to regular traditional / POSIX Awk.)

This is mildly more complex, but saves one external process compared to the first attempt above. That will only matter if you are running this in a really tight loop, but then you should probably optimize further to pass in a list of search strings, and search for them all in a single pass, anyway.

find the instance that end with $name , count the lines

$ grep -oE '\$name\b' file | wc -l

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM