简体   繁体   English

awk打印行问题

[英]awk Print Line Issue

I'm experiencing some issues with a awk command right now. 我现在遇到了awk命令的一些问题。 The original script was developed using awk on MacOS and was then ported to Linux. 原始脚本是在MacOS上使用awk开发的,然后移植到Linux。 There awk shows a different behavior. awk显示了不同的行为。

What I want to do is to count the occurrences of single strings provided via /tmp/test.uniq.txt in the file /tmp/test.txt . 我想要做的是计算文件/tmp/test.txt通过/tmp/test.uniq.txt提供的单个字符串的出现次数。

awk '{print $1, system("cat /tmp/test.txt | grep -o -c " $1)}' /tmp/test.uniq.txt

Mac delivers an expected output like: Mac提供了预期的输出,如:

  test1 2 
  test2 1

The output is in one line, the sting and the number of occurrences, separated by a whitespace. 输出在一行中,sting和出现次数由空格分隔。

Linux delivers an output like: Linux提供如下输出:

2
test1 1
test2 

The output is not in one line an the output of the system command is printed first. 输出不在一行中,首先打印系统命令的输出。

Sample input: test.txt looks like: 示例输入:test.txt如下所示:

test1 test test 
test1 test test
test2 test test

test.uniq.txt looks like: test.uniq.txt看起来像:

test1
test2

As comments suggested that using grep and cat etc using system function is not recommended as awk is complete language that can perform most of these tasks. 由于评论建议不建议使用grepcat等使用system函数,因为awk是可以执行大部分这些任务的完整语言。

You can use following awk command to replace your cat | grep 您可以使用以下awk命令替换您的cat | grep cat | grep functionality: cat | grep功能:

awk 'FNR == NR {a[$1]=0; next} {for (i=1; i<=NF; i++) if ($i in a) a[$i]++} 
END { for (i in a) print i, a[i] }' uniq.txt test.txt

test1 2
test2 1

Note that this output doesn't match with the count 5 as your question states as your sample data is probably different. 请注意,此输出与计数5不匹配,因为您的样本数据可能不同。


References: 参考文献:

It looks to me as if you're trying to count the number of line containing each unique string in the uniq file. 它看起来好像你正在尝试计算uniq文件中包含每个唯一字符串的行数。 But the way you're doing it is .. awkward, and as you've demonstrated, inconsistent between versions of awk. 但你正在做的方式是......尴尬,正如你所证明的那样,awk版本之间存在不一致。

The following might work a little better: 以下可能会更好一点:

$ awk '
  NR==FNR {
    a[$1]
    next
  }
  {
    for (i in a) {
      if ($1~i) {
        a[i]++
      }
    }
  }
  END {
    for (i in a)
      printf "%6d\t%s\n",a[i],i
  }
' test.uniq.txt test.txt
         2  test1
         1  test2

This loads your uniq file into an array, then for every line in your text file, steps through the array to count the matches. 这会将您的uniq文件加载到一个数组中,然后对于文本文件中的每一行,逐步执行数组以计算匹配。

Note that these are being compared as regular expressions, without word boundaries, so test1 will also be counted as part of test12 . 请注意,这些是作为正则表达式进行比较,没有字边界,因此test1也将被计为test12一部分。

Another way might be to use grep + sort + uniq : 另一种方法可能是使用grep + sort + uniq

grep -o -w -F -f uniq.txt test.txt | sort | uniq -c

It's a pipeline but a short one 这是一条管道但很短的管道

From man grep : man grep

  • -F , --fixed-strings , --fixed-regexp Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched. -F , - --fixed-strings , - --fixed-regexp PATTERN解释为固定字符串列表,由换行符分隔,其中任何一个都要匹配。 ( -F is specified by POSIX, --fixed-regexp is an obsoleted alias, please do not use it in new scripts.) -F由POSIX指定, - --fixed-regexp是废弃的别名,请不要在新脚本中使用它。)
  • -f FILE , --file=FILE Obtain patterns from FILE, one per line. -f FILE , - --file=FILE从FILE获取模式,每行一个。 The empty file contains zero patterns and therefore matches nothing. 空文件包含零模式,因此不匹配任何内容。 ( -f is specified by POSIX.) -f由POSIX指定。)
  • -o , --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line. -o , - --only-matching仅打印匹配行的匹配(非空)部分,每个此类部分位于单独的输出行上。
  • -w , --word-regexp Select only those lines containing matches that form whole words. -w , - --word-regexp仅选择包含构成整个单词的匹配项的行。 The test is that the matching substring must either be at the beginning of the line or preceded by a non-word constituent character. 测试是匹配的子字符串必须位于行的开头或前面是非单词构成字符。 Similarly, it must be either at the end of the line or followed by a non-word constituent character. 同样,它必须位于行的末尾或后跟非单词构成字符。 Word-constituent characters are letters, digits, and the underscore. 单词构成字符是字母,数字和下划线。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM