简体   繁体   English

使用 awk 计算另一个文件中模式出现的次数

[英]using awk to count the number of occurrences of pattern from another file

I am trying to take a file containing a list and count how many times items in that list occur in a target file.我正在尝试获取一个包含列表的文件并计算该列表中的项目在目标文件中出现的次数。 something like:就像是:

list.txt
blonde
red
black

target.txt
bob blonde male
sam blonde female

desired_output.txt
blonde 2
red 0
black 0

I have coopted the following code to get the values that are present in target.txt:我选择了以下代码来获取存在于 target.txt 中的值:

awk '{count[$2]++} END {for (word in count) print word, count[word]}' target.txt

But the output does not include the desired items that are in the liist.txt but not the target.txt但是输出不包括在 liist.txt 但不包括在 target.txt 中的所需项目

current_output.txt
blonde 2

I have tried a few things to get this working including:我已经尝试了一些事情来让这个工作,包括:

awk '{word[$1]++;next;count[$2]++} END {for (word in count) print word, count[word]}' list.txt target.txt

However, I have had no success.但是,我没有成功。

Could anyone help me make it so that this awk statement reads the key.txt file?任何人都可以帮助我使这个 awk 语句读取 key.txt 文件吗? any explanation of the code would also be much appreciated.对代码的任何解释也将不胜感激。 Thanks!谢谢!

awk '
  NR==FNR{a[$0]; next}
  {
    for(i=1; i<=NF; i++){
      if ($i in a){ a[$i]++ }
    }
  }
  END{
    for(key in a){ printf "%s %d\n", key, a[key] }
  }
' list.txt target.txt
  • NR==FNR{a[$0]; next} NR==FNR{a[$0]; next} The condition NR==FNR is only true for the first file, so the keys of array a are lines of list.txt . NR==FNR{a[$0]; next}条件NR==FNR仅适用于第一个文件,因此数组a的键是list.txt行。

  • for(i=1; i<=NF; i++) Now for the second file, this loops over all its fields. for(i=1; i<=NF; i++)现在对于第二个文件,这将遍历其所有字段。

    • if ($i in a){ a[$i]++ } This checks if the field $i is present as a key in the array a . if ($i in a){ a[$i]++ }这检查字段$i是否作为数组a的键存在。 If yes, the value (initially zero) associated with that key is incremented.如果是,则与该键关联的值(初始为零)递增。
  • At the END , we just print the key followed by the number of occurrences a[key] and a newline ( \\n ).END ,我们只打印key然后是出现次数a[key]和换行符 ( \\n )。

Output:输出:

blonde 2
red 0
black 0

Notes:笔记:

  1. Because of %d , the printf statement forces the conversion of a[key] to an integer in case it is still unset.由于%dprintf语句强制将a[key]转换为整数,以防它仍未设置。 The whole statement could be replaced by a simpler print key, a[key]+0 .整个语句可以替换为更简单的print key, a[key]+0 I missed that when writing the answer, but now you know two ways of doing the same thing.我在写答案时错过了这一点,但现在您知道做同一件事的两种方法。 ;) ;)

  2. In your attempt you were, for some reason, only addressing field 2 ( $2 ), ignoring other columns.在您的尝试中,出于某种原因,您只处理字段 2 ( $2 ),而忽略了其他列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM