My question is not easy to ask, I try explain the problem with the following example:
/home/luther/tipical_surnames.txt
Smith
Johnson
Williams
Jones
Brown
#Davis
Miller
Wilson
#Moore
Taylor
Anderson
/home/luther/employers.txt
2000 Johnson A lot-of details / BJC3000,6000, i550 0
2101 Smith A lot-of details / BJC3000,6000, i550 0
2102 Smith A lot-of details / BJC3000,6000, i550 0
2103 Jones A lot-of details / BJC3000,6000, i550 0
2104 Johnson A lot-of details / BJC3000,6000, i550 0
2100 Smith A lot-of details / BJC3000,6000, i550 0
I have a list with the favorite surnames and another with the name of employers. Let's check how many people have the most popular surname in the company, using console:
grep -v "#" /home/luther/tipical_surnames.txt | sed -n 1'p' | cut -f 1
Smith
grep Smith /home/luther/employers.txt | wc -l
230
Work perfect. Now lets check the first 5 most popular surnames using a simple bash script:
#!/bin/bash
counter=1
while [ $counter -le 5 ]
do
surname=`grep -v "#" /home/luther/tipical_surnames.txt | sed -n "$counter"'p' | cut -f 1`
qty=`grep "$surname" /home/luther/employers.txt | wc -l`
echo $surname
echo $qty
counter=$(( $counter + 1 ))
done
And the result the follows:
Smith
0
Johnson
0
Williams
0
Jones
0
Brown
0
Whats wrong?
Update: Like I wrote I tested the script on other computer and everything is works fine. After I try the follow:
root@problematic:/var/www# cat testfile.bash
#!/bin/bash
for (( c=1; c<=5; c++ ))
{
echo $c
}
root@problematic:/var/www# bash testfile.bash
testfile.bash: line 2: syntax error near unexpected token `$'\r''
'estfile.bash: line 2: `for (( c=1; c<=5; c++ ))
root@problematic:/var/www# echo $BASH_VERSION
4.2.37(1)-release
root@problematic:/var/www#
Of course on other computer this simply script work as expected, without error.
This is obviously untested since you haven't posted sample input but this is the kind of approach you should use:
awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
PROCINFO["sorted_in"] = "@val_num_desc"
for (name in cnt) {
print name, cnt
if (++c == 5) {
break
}
}
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt
Replace "WHATEVER" with the field number where employee surnames are stored in employers.txt.
The above uses GNU awk for sorted_in, with other awks I'd just remove the PROCINFO line and the count from the output loop and pipe the output to sort then head, eg:
awk '
NR==FNR { if (!/#/) cnt[$1]=0; next }
{ cnt[$WHATEVER]++ }
END {
for (name in cnt) {
print name, cnt
}
}
' /home/luther/tipical_surnames.txt /home/luther/employers.txt | sort -k2,1nr | head -5
or whatever the right sort options are.
I'm actually not quite sure. I tested your script, by copying it and pasting it, with imagined data ( /usr/share/dict/words
) and it seems to work as expected. I wonder if there is a difference between the script you posted and the script you're running?
While at it, I took the liberty of making it run a bit smoother. Notice how, in the loop, you read the entirety of the surnames file in each iteration? Also, grep
+ wc -l
may be replaced by grep -c
. I'm also adding -F
to the first invocation of grep
since the pattern ( #
) is fixed strings. The grep
into the employee file uses \\<$name\\>
to make sure we only get the Johns and no Johnssons when $name
is John
.
#!/bin/bash
employees_in="/usr/share/dict/words"
names_in="/usr/share/dict/words"
grep -v -F "#" "$names_in" | head -n 5 | cut -f 1 |
while read -r name; do
count="$( grep -c "\<$names\> " "$employees_in" )"
printf "name: %-10s\tcount: %d\n" "$name" "$count"
done
Testing it:
$ bash script.sh
name: A count: 1
name: a count: 1
name: aa count: 1
name: aal count: 1
name: aalii count: 1
Note: I get only ones in the count because the dictionary (not surprisingly) contains only unique words.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.