简体   繁体   中英

Counting grep result wont work in bash script

My question is not easy to ask, I try explain the problem with the following example:

/home/luther/tipical_surnames.txt

Smith
Johnson
Williams
Jones
Brown
#Davis
Miller
Wilson
#Moore
Taylor
Anderson

/home/luther/employers.txt

2000    Johnson     A lot-of details / BJC3000,6000, i550                0
2101    Smith       A lot-of details / BJC3000,6000, i550                0
2102    Smith       A lot-of details / BJC3000,6000, i550                0
2103    Jones       A lot-of details / BJC3000,6000, i550                0
2104    Johnson     A lot-of details / BJC3000,6000, i550                0
2100    Smith       A lot-of details / BJC3000,6000, i550                0

I have a list with the favorite surnames and another with the name of employers. Let's check how many people have the most popular surname in the company, using console:

grep -v "#" /home/luther/tipical_surnames.txt | sed -n 1'p' | cut -f 1
Smith
grep Smith /home/luther/employers.txt | wc -l
230

Work perfect. Now lets check the first 5 most popular surnames using a simple bash script:

#!/bin/bash
counter=1
while [ $counter -le 5 ]
 do
  surname=`grep -v "#" /home/luther/tipical_surnames.txt | sed -n "$counter"'p' | cut -f 1`
  qty=`grep "$surname" /home/luther/employers.txt | wc -l`
  echo $surname
  echo $qty
  counter=$(( $counter + 1 ))
 done

And the result the follows:

Smith
0
Johnson
0
Williams
0
Jones
0
Brown
0

Whats wrong?

Update: Like I wrote I tested the script on other computer and everything is works fine. After I try the follow:

root@problematic:/var/www# cat testfile.bash
#!/bin/bash
for (( c=1; c<=5; c++ ))
{
echo $c
}

root@problematic:/var/www# bash testfile.bash
testfile.bash: line 2: syntax error near unexpected token `$'\r''
'estfile.bash: line 2: `for (( c=1; c<=5; c++ ))
root@problematic:/var/www# echo $BASH_VERSION
4.2.37(1)-release
root@problematic:/var/www#

Of course on other computer this simply script work as expected, without error.

This is obviously untested since you haven't posted sample input but this is the kind of approach you should use:

awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (name in cnt) {
        print name, cnt
        if (++c == 5) {
            break
        }
    }
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt

Replace "WHATEVER" with the field number where employee surnames are stored in employers.txt.

The above uses GNU awk for sorted_in, with other awks I'd just remove the PROCINFO line and the count from the output loop and pipe the output to sort then head, eg:

awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
    for (name in cnt) {
        print name, cnt
    }
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt | sort -k2,1nr | head -5

or whatever the right sort options are.

I'm actually not quite sure. I tested your script, by copying it and pasting it, with imagined data ( /usr/share/dict/words ) and it seems to work as expected. I wonder if there is a difference between the script you posted and the script you're running?

While at it, I took the liberty of making it run a bit smoother. Notice how, in the loop, you read the entirety of the surnames file in each iteration? Also, grep + wc -l may be replaced by grep -c . I'm also adding -F to the first invocation of grep since the pattern ( # ) is fixed strings. The grep into the employee file uses \\<$name\\> to make sure we only get the Johns and no Johnssons when $name is John .

#!/bin/bash

employees_in="/usr/share/dict/words"
names_in="/usr/share/dict/words"

grep -v -F "#" "$names_in" | head -n 5 | cut -f 1 |
while read -r name; do
    count="$( grep -c "\<$names\> " "$employees_in" )"
    printf "name: %-10s\tcount: %d\n" "$name" "$count"
done

Testing it:

$ bash script.sh
name: A             count: 1
name: a             count: 1
name: aa            count: 1
name: aal           count: 1
name: aalii         count: 1

Note: I get only ones in the count because the dictionary (not surprisingly) contains only unique words.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM