[英]awk Print Line Issue
I'm experiencing some issues with a awk
command right now. 我现在遇到了
awk
命令的一些问题。 The original script was developed using awk
on MacOS and was then ported to Linux. 原始脚本是在MacOS上使用
awk
开发的,然后移植到Linux。 There awk
shows a different behavior. awk
显示了不同的行为。
What I want to do is to count the occurrences of single strings provided via /tmp/test.uniq.txt
in the file /tmp/test.txt
. 我想要做的是计算文件
/tmp/test.txt
通过/tmp/test.uniq.txt
提供的单个字符串的出现次数。
awk '{print $1, system("cat /tmp/test.txt | grep -o -c " $1)}' /tmp/test.uniq.txt
Mac delivers an expected output like: Mac提供了预期的输出,如:
test1 2
test2 1
The output is in one line, the sting and the number of occurrences, separated by a whitespace. 输出在一行中,sting和出现次数由空格分隔。
Linux delivers an output like: Linux提供如下输出:
2
test1 1
test2
The output is not in one line an the output of the system command is printed first. 输出不在一行中,首先打印系统命令的输出。
Sample input: test.txt looks like: 示例输入:test.txt如下所示:
test1 test test
test1 test test
test2 test test
test.uniq.txt looks like: test.uniq.txt看起来像:
test1
test2
As comments suggested that using grep
and cat
etc using system
function is not recommended as awk
is complete language that can perform most of these tasks. 由于评论建议不建议使用
grep
和cat
等使用system
函数,因为awk
是可以执行大部分这些任务的完整语言。
You can use following awk
command to replace your cat | grep
您可以使用以下
awk
命令替换您的cat | grep
cat | grep
functionality: cat | grep
功能:
awk 'FNR == NR {a[$1]=0; next} {for (i=1; i<=NF; i++) if ($i in a) a[$i]++}
END { for (i in a) print i, a[i] }' uniq.txt test.txt
test1 2
test2 1
Note that this output doesn't match with the count 5
as your question states as your sample data is probably different. 请注意,此输出与计数
5
不匹配,因为您的样本数据可能不同。
References: 参考文献:
It looks to me as if you're trying to count the number of line containing each unique string in the uniq
file. 它看起来好像你正在尝试计算
uniq
文件中包含每个唯一字符串的行数。 But the way you're doing it is .. awkward, and as you've demonstrated, inconsistent between versions of awk. 但你正在做的方式是......尴尬,正如你所证明的那样,awk版本之间存在不一致。
The following might work a little better: 以下可能会更好一点:
$ awk '
NR==FNR {
a[$1]
next
}
{
for (i in a) {
if ($1~i) {
a[i]++
}
}
}
END {
for (i in a)
printf "%6d\t%s\n",a[i],i
}
' test.uniq.txt test.txt
2 test1
1 test2
This loads your uniq
file into an array, then for every line in your text file, steps through the array to count the matches. 这会将您的
uniq
文件加载到一个数组中,然后对于文本文件中的每一行,逐步执行数组以计算匹配。
Note that these are being compared as regular expressions, without word boundaries, so test1
will also be counted as part of test12
. 请注意,这些是作为正则表达式进行比较,没有字边界,因此
test1
也将被计为test12
一部分。
Another way might be to use grep
+ sort
+ uniq
: 另一种方法可能是使用
grep
+ sort
+ uniq
:
grep -o -w -F -f uniq.txt test.txt | sort | uniq -c
It's a pipeline but a short one 这是一条管道但很短的管道
From man grep
: 从
man grep
:
-F
,--fixed-strings
,--fixed-regexp
Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.-F
, ---fixed-strings
, ---fixed-regexp
PATTERN解释为固定字符串列表,由换行符分隔,其中任何一个都要匹配。 (-F
is specified by POSIX,--fixed-regexp
is an obsoleted alias, please do not use it in new scripts.)(
-F
由POSIX指定, ---fixed-regexp
是废弃的别名,请不要在新脚本中使用它。)-f FILE
,--file=FILE
Obtain patterns from FILE, one per line.-f FILE
, ---file=FILE
从FILE获取模式,每行一个。 The empty file contains zero patterns and therefore matches nothing.空文件包含零模式,因此不匹配任何内容。 (
-f
is specified by POSIX.)(
-f
由POSIX指定。)-o
,--only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.-o
, ---only-matching
仅打印匹配行的匹配(非空)部分,每个此类部分位于单独的输出行上。-w
,--word-regexp
Select only those lines containing matches that form whole words.-w
, ---word-regexp
仅选择包含构成整个单词的匹配项的行。 The test is that the matching substring must either be at the beginning of the line or preceded by a non-word constituent character.测试是匹配的子字符串必须位于行的开头或前面是非单词构成字符。 Similarly, it must be either at the end of the line or followed by a non-word constituent character.
同样,它必须位于行的末尾或后跟非单词构成字符。 Word-constituent characters are letters, digits, and the underscore.
单词构成字符是字母,数字和下划线。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.