简体   繁体   English

计算grep结果在bash脚本中不起作用

[英]Counting grep result wont work in bash script

My question is not easy to ask, I try explain the problem with the following example: 我的问题不容易提出,我尝试通过以下示例说明问题:

/home/luther/tipical_surnames.txt /home/luther/tipical_surnames.txt

Smith
Johnson
Williams
Jones
Brown
#Davis
Miller
Wilson
#Moore
Taylor
Anderson

/home/luther/employers.txt /home/luther/employers.txt

2000    Johnson     A lot-of details / BJC3000,6000, i550                0
2101    Smith       A lot-of details / BJC3000,6000, i550                0
2102    Smith       A lot-of details / BJC3000,6000, i550                0
2103    Jones       A lot-of details / BJC3000,6000, i550                0
2104    Johnson     A lot-of details / BJC3000,6000, i550                0
2100    Smith       A lot-of details / BJC3000,6000, i550                0

I have a list with the favorite surnames and another with the name of employers. 我有一个清单,上面有喜欢的姓氏,另一个是雇主的名字。 Let's check how many people have the most popular surname in the company, using console: 让我们使用控制台检查有多少人拥有公司中最受欢迎的姓氏:

grep -v "#" /home/luther/tipical_surnames.txt | sed -n 1'p' | cut -f 1
Smith
grep Smith /home/luther/employers.txt | wc -l
230

Work perfect. 做工完美。 Now lets check the first 5 most popular surnames using a simple bash script: 现在,使用简单的bash脚本检查前5个最受欢迎的姓氏:

#!/bin/bash
counter=1
while [ $counter -le 5 ]
 do
  surname=`grep -v "#" /home/luther/tipical_surnames.txt | sed -n "$counter"'p' | cut -f 1`
  qty=`grep "$surname" /home/luther/employers.txt | wc -l`
  echo $surname
  echo $qty
  counter=$(( $counter + 1 ))
 done

And the result the follows: 结果如下:

Smith
0
Johnson
0
Williams
0
Jones
0
Brown
0

Whats wrong? 怎么了?

Update: Like I wrote I tested the script on other computer and everything is works fine. 更新:就像我写的一样,我在其他计算机上测试了脚本,一切正常。 After I try the follow: 我尝试以下操作后:

root@problematic:/var/www# cat testfile.bash
#!/bin/bash
for (( c=1; c<=5; c++ ))
{
echo $c
}

root@problematic:/var/www# bash testfile.bash
testfile.bash: line 2: syntax error near unexpected token `$'\r''
'estfile.bash: line 2: `for (( c=1; c<=5; c++ ))
root@problematic:/var/www# echo $BASH_VERSION
4.2.37(1)-release
root@problematic:/var/www#

Of course on other computer this simply script work as expected, without error. 当然,在其他计算机上,此简单脚本可以按预期工作,没有错误。

This is obviously untested since you haven't posted sample input but this is the kind of approach you should use: 显然,这是未经测试的,因为您还没有发布示例输入,但是这是您应该使用的一种方法:

awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (name in cnt) {
        print name, cnt
        if (++c == 5) {
            break
        }
    }
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt

Replace "WHATEVER" with the field number where employee surnames are stored in employers.txt. 将“ WHATEVER”替换为存储在ploys.txt中员工姓氏的字段编号。

The above uses GNU awk for sorted_in, with other awks I'd just remove the PROCINFO line and the count from the output loop and pipe the output to sort then head, eg: 上面的代码使用GNU awk进行sorted_in,与其他awks一样,我只是从输出循环中删除PROCINFO行和计数,然后将输出通过管道传递给sort然后例如head:

awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
    for (name in cnt) {
        print name, cnt
    }
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt | sort -k2,1nr | head -5

or whatever the right sort options are. 或其他正确的排序选项。

I'm actually not quite sure. 我实际上不太确定。 I tested your script, by copying it and pasting it, with imagined data ( /usr/share/dict/words ) and it seems to work as expected. 我通过使用想象的数据( /usr/share/dict/words )复制并粘贴来测试您的脚本,该脚本似乎可以正常工作。 I wonder if there is a difference between the script you posted and the script you're running? 我想知道您发布的脚本和您正在运行的脚本之间是否有区别?

While at it, I took the liberty of making it run a bit smoother. 在此期间,我采取了使它运行更流畅的自由方式。 Notice how, in the loop, you read the entirety of the surnames file in each iteration? 请注意,在循环中,如何在每次迭代中读取整个姓氏文件? Also, grep + wc -l may be replaced by grep -c . 同样, grep + wc -l可以被grep -c代替。 I'm also adding -F to the first invocation of grep since the pattern ( # ) is fixed strings. 由于模式( # )是固定字符串,因此我还在grep的第一次调用中添加了-F The grep into the employee file uses \\<$name\\> to make sure we only get the Johns and no Johnssons when $name is John . 雇员文件中的grep使用\\<$name\\>来确保当$nameJohn时,我们仅获得Johns,而没有Johnssons。

#!/bin/bash

employees_in="/usr/share/dict/words"
names_in="/usr/share/dict/words"

grep -v -F "#" "$names_in" | head -n 5 | cut -f 1 |
while read -r name; do
    count="$( grep -c "\<$names\> " "$employees_in" )"
    printf "name: %-10s\tcount: %d\n" "$name" "$count"
done

Testing it: 测试它:

$ bash script.sh
name: A             count: 1
name: a             count: 1
name: aa            count: 1
name: aal           count: 1
name: aalii         count: 1

Note: I get only ones in the count because the dictionary (not surprisingly) contains only unique words. 注意:由于字典(仅包含唯一词)(不足为奇),因此我只得到一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM