I'm experiencing some issues with a awk
command right now. The original script was developed using awk
on MacOS and was then ported to Linux. There awk
shows a different behavior.
What I want to do is to count the occurrences of single strings provided via /tmp/test.uniq.txt
in the file /tmp/test.txt
.
awk '{print $1, system("cat /tmp/test.txt | grep -o -c " $1)}' /tmp/test.uniq.txt
Mac delivers an expected output like:
test1 2
test2 1
The output is in one line, the sting and the number of occurrences, separated by a whitespace.
Linux delivers an output like:
2
test1 1
test2
The output is not in one line an the output of the system command is printed first.
Sample input: test.txt looks like:
test1 test test
test1 test test
test2 test test
test.uniq.txt looks like:
test1
test2
As comments suggested that using grep
and cat
etc using system
function is not recommended as awk
is complete language that can perform most of these tasks.
You can use following awk
command to replace your cat | grep
cat | grep
functionality:
awk 'FNR == NR {a[$1]=0; next} {for (i=1; i<=NF; i++) if ($i in a) a[$i]++}
END { for (i in a) print i, a[i] }' uniq.txt test.txt
test1 2
test2 1
Note that this output doesn't match with the count 5
as your question states as your sample data is probably different.
References:
It looks to me as if you're trying to count the number of line containing each unique string in the uniq
file. But the way you're doing it is .. awkward, and as you've demonstrated, inconsistent between versions of awk.
The following might work a little better:
$ awk '
NR==FNR {
a[$1]
next
}
{
for (i in a) {
if ($1~i) {
a[i]++
}
}
}
END {
for (i in a)
printf "%6d\t%s\n",a[i],i
}
' test.uniq.txt test.txt
2 test1
1 test2
This loads your uniq
file into an array, then for every line in your text file, steps through the array to count the matches.
Note that these are being compared as regular expressions, without word boundaries, so test1
will also be counted as part of test12
.
Another way might be to use grep
+ sort
+ uniq
:
grep -o -w -F -f uniq.txt test.txt | sort | uniq -c
It's a pipeline but a short one
From man grep
:
-F
,--fixed-strings
,--fixed-regexp
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. (-F
is specified by POSIX,--fixed-regexp
is an obsoleted alias, please do not use it in new scripts.)-f FILE
,--file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns and therefore matches nothing. (-f
is specified by POSIX.)-o
,--only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.-w
,--word-regexp
Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.