如何计算每个文件中某一列中某个字符串出现的次数，结果文件名和 output 和 count + awk

Question

How do I count the number of occurrences of a string in a column in each file, and output the result filename and count + awk如何计算每个文件中某一列中某个字符串出现的次数，结果文件名和 output 和 count + awk

I have these 2 files, :我有这两个文件：

>cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5


> cat fild.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5

how do I get this output:(basically count the number of occurrences of a string (eg "col1")in a column in each file )我如何得到这个 output:(基本上计算每个文件的列中字符串（例如“col1”）的出现次数）

file.csv,5
fild.csv,5

Below, my attempts, for my reference:下面，我的尝试，供我参考：

Output column/field1 Output 列/字段 1

> awk -F, '$1 =="col1" {print $1}' file.csv
col1
col1
col1
col1
col1

Output filenamecolumn/field1, how do I add a comma as separator Output filenamecolumn/field1，如何添加逗号作为分隔符

> awk -F, '$1 =="col1" {print FILENAME $1}' file.csv
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1

output Id like output 我喜欢

file.csv,5

attempt working on 2 files:尝试处理 2 个文件：

> awk -F, '$1 =="col1" {print FILENAME $1}' fil*.csv
fild.csvcol1
fild.csvcol1
fild.csvcol1
fild.csvcol1
fild.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1
file.csvcol1

But the output i'd like is this:但我想要的 output 是这样的：

file.csv,5
fild.csv,5

Answer回答

this works for me:这对我有用：

awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, (cnt>0&&cnt?cnt:"0"); cnt=0}' fil*.csv
fild.csv,5
file1.csv,0
file.csv,5

Answer 1

If you're using GNU awk another potential solution is to use the ENDFILE special pattern, eg using @markp-fuso's example data:如果您使用的是 GNU awk，另一种可能的解决方案是使用ENDFILE特殊模式，例如使用@markp-fuso 的示例数据：

cat filb.csv            # empty
cat filc.csv
col1

cat fild.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5

cat file.csv
col1,col2,col3,col4
col1,col2,col3
col1,col2
col1
col1,col2,col3,col4,col5


awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, (cnt>0&&cnt?cnt:"0"); cnt=0}' fil*.csv
filb.csv,0
filc.csv,1
fild.csv,5
file.csv,5

# the 'cnt>0&&cnt?cnt:"0"' is to handle empty files
# basically, if there are lines print cnt otherwise, if
# there are no lines print "0"

Edit编辑

As commented by @EdMorten, cnt+0 can be used instead of cnt>0&&cnt?cnt:"0" to handle empty files (much easier to remember,).正如@EdMorten 评论的那样，可以使用cnt+0代替cnt>0&&cnt?cnt:"0"来处理空文件（更容易记住）。 eg例如

awk 'BEGIN{FS=OFS=","} $1 == "col1" {cnt++} ENDFILE{print FILENAME, cnt+0; cnt=0}' fil*.csv
filb.csv,0
filc.csv,1
fild.csv,5
file.csv,5

Answer 2

Adding a couple more files to the mix:将更多文件添加到组合中：

$ cat filb.csv            # empty
$ cat filc.csv
col2

One awk approach:一个awk方法：

awk -v str='col1' '                       # pass in string to search for
BEGIN { FS=OFS=","
        for (i=1;i<ARGC;i++)
            count[ARGV[i]]=0              # initialize counter for all files; addresses issue where a file may not have any matches or file is empty (ie, count==0)
      }
      { for (i=1;i<=NF;i++)               # loop through fields looking for a match and ...
            if ($i==str)                  # if found then ...
               count[FILENAME]++          # increment our counter
      }
END   { for (fname in count)
            print fname,count[fname]
      }
' fil?.csv

This generates:这会产生：

file.csv,5
filb.csv,0
fild.csv,5
filc.csv,0

NOTES:笔记：

$i==str - assumes we're looking for an exact match on the value in the field (as opposed to a substring of the field's value) $i==str - 假设我们正在寻找与字段中的值完全匹配的值（而不是字段值的 substring）
assumes we need to match str on any field/column in the file, otherwise we'll need to add an additional input variable to designate which column(s) to search假设我们需要在文件中的任何字段/列上匹配str ，否则我们将需要添加一个额外的输入变量来指定要搜索的列
output ordering is not guaranteed; output 不保证订购； OP can pipe the results to sort , or add some code to allow awk to sort the output before printing to stdout OP 可以对 pipe 结果进行sort ，或者添加一些代码以允许awk在打印到 stdout 之前对 output 进行排序

An alternative grep|tr idea:另一种grep|tr想法：

$ grep -oc "$str" fil?.csv | tr ':' ','
filb.csv,0
filc.csv,0
fild.csv,5
file.csv,5

如何计算每个文件中某一列中某个字符串出现的次数，结果文件名和 output 和 count + awk

问题描述

Below, my attempts, for my reference:下面，我的尝试，供我参考：

Answer回答

2 个解决方案

解决方案1
4 已采纳 2023-01-30 23:45:28

Edit编辑

解决方案2
3 2023-01-30 23:29:05

如何计算每个文件中某一列中某个字符串出现的次数，结果文件名和 output 和 count + awk

问题描述

Below, my attempts, for my reference:下面，我的尝试，供我参考：

Answer回答

2 个解决方案

解决方案1 4 已采纳 2023-01-30 23:45:28

Edit编辑

解决方案2 3 2023-01-30 23:29:05

解决方案1
4 已采纳 2023-01-30 23:45:28

解决方案2
3 2023-01-30 23:29:05