I have a list of records as following
Source:
a,yes
a,yes
b,No
c,N/A
c,N/A
c,N/A
d,xyz
d,abc
d,abc
Output:
a, Yes 2
b, No 1
c, N/A 3
d, xyz 1
d, abc 2
c, N/A "File is not correct"
Here 'Yes' and 'No' are the acceptable words, If any other word count is greater than the 'Yes' or 'No' word count for an individual $1 value then we have issue a statement like "file is not good"
I have tried the below script
awk -F, '{a[$1]++;}END{for (i in a)print i, a[i];}' filetest.txt
If you are not worried about the output sequence(same as Input_file) then following may help you in same.
awk -F, '{array[$1", "$2]++;} /yes/{y++;next} /No/{n++;next} /N\/A/{count++;next} END{;for(i in array){printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")}}' Input_file
EDIT: Adding a non-one liner form of solution too now.
awk -F, '{
array[$1", "$2]++;
}
/yes/{
y++;
next
}
/No/{
n++;
next
}
/N\/A/{
count++;
next
}
END{;
for(i in array){
printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")
}
}' Input_file
EDIT2: As per OP N/A shouldn't be hardcoded then following code will check count of string yes, count of string no and count of rest of the second fields. Then it will compare count of rest with yes and no, based on that it will print the lines as per OP's request.
awk -F, '{
array[$1", "$2]++;
}
/yes/{
y++;
next
}
/No/{
n++;
next
}
{
count[$2]++;
}
END{
for(i in count){
val=val>count[i]?val:count[i]
};
for(i in array){
printf("%s %s%s\n",i,array[i],(val>y && val>n) &&(i !~ /yes/ && i !~ /No/)?RS i" File is not correct":"")
}
}' Input_file
After running above code I am getting following.
./script.ksh
d, xyz 1
d, xyz File is not correct
c, N/A 3
c, N/A File is not correct
b, No 1
a, yes 2
d, abc 2
d, abc File is not correct
With GNU awk for true multi-dimensional arrays:
$ cat tst.awk
BEGIN { FS=","; OFS=", " }
{ cnt[$1][$2]++ }
END {
for (key in cnt) {
for (val in cnt[key]) {
cur = cnt[key][val]
print key, val " " cur
if (tolower(val) ~ /^(yes|no)$/) {
maxGood = (maxGood > cur ? maxGood : cur)
}
else {
badCnt[key][val] = cur
}
}
}
print ""
for (key in badCnt) {
for (val in badCnt[key]) {
if (badCnt[key][val] > maxGood) {
print key, val " File is not correct"
}
}
}
}
$ awk -f tst.awk file
a, yes 2
b, No 1
c, N/A 3
d, abc 2
d, xyz 1
c, N/A File is not correct
Use tolower()
in other places or remove it as appropriate if your $2 data really can be upper or lower case or if that's just a mistake in your example and depending on if you want that treated as an error or not.
The output will be in random order courtesy of the in
operator - that's easily changed to any other order if you care.
#!/bin/sh
FILE=1.txt
for r in `cat $FILE | sort | uniq`; do
count=`grep "$r" "$FILE" | wc -l | sed -e 's/^ *//'`
echo "$r $count";
done
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.