简体   繁体   中英

How to get the validate the count with the group by data in unix

I have a list of records as following

Source:

a,yes
a,yes
b,No
c,N/A
c,N/A
c,N/A
d,xyz
d,abc
d,abc

Output:

a, Yes 2
b, No 1
c, N/A 3
d, xyz 1
d, abc 2

c, N/A "File is not correct"

Here 'Yes' and 'No' are the acceptable words, If any other word count is greater than the 'Yes' or 'No' word count for an individual $1 value then we have issue a statement like "file is not good"

I have tried the below script

awk -F, '{a[$1]++;}END{for (i in a)print i, a[i];}' filetest.txt

If you are not worried about the output sequence(same as Input_file) then following may help you in same.

awk -F, '{array[$1", "$2]++;} /yes/{y++;next} /No/{n++;next} /N\/A/{count++;next} END{;for(i in array){printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")}}'  Input_file

EDIT: Adding a non-one liner form of solution too now.

awk -F, '{
array[$1", "$2]++;
}
/yes/{
  y++;
  next
}
/No/{
  n++;
  next
}
/N\/A/{
  count++;
  next
}
END{;
  for(i in array){
     printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")
}
}'  Input_file

EDIT2: As per OP N/A shouldn't be hardcoded then following code will check count of string yes, count of string no and count of rest of the second fields. Then it will compare count of rest with yes and no, based on that it will print the lines as per OP's request.

awk -F, '{
array[$1", "$2]++;
}
/yes/{
  y++;
  next
}
/No/{
  n++;
  next
}
{
  count[$2]++;
}
END{
  for(i in count){
    val=val>count[i]?val:count[i]
};
  for(i in array){
    printf("%s %s%s\n",i,array[i],(val>y && val>n) &&(i !~ /yes/ && i !~ /No/)?RS i" File is not correct":"")
}
}'   Input_file

After running above code I am getting following.

./script.ksh
d, xyz 1
d, xyz File is not correct
c, N/A 3
c, N/A File is not correct
b, No 1
a, yes 2
d, abc 2
d, abc File is not correct

With GNU awk for true multi-dimensional arrays:

$ cat tst.awk
BEGIN { FS=","; OFS=", " }
{ cnt[$1][$2]++ }
END {
    for (key in cnt) {
        for (val in cnt[key]) {
            cur = cnt[key][val]
            print key, val " " cur
            if (tolower(val) ~ /^(yes|no)$/) {
                maxGood = (maxGood > cur ? maxGood : cur)
            }
            else {
                badCnt[key][val] = cur
            }
        }
    }

    print ""
    for (key in badCnt) {
        for (val in badCnt[key]) {
            if (badCnt[key][val] > maxGood) {
                print key, val " File is not correct"
            }
        }
    }
}

$ awk -f tst.awk file
a, yes 2
b, No 1
c, N/A 3
d, abc 2
d, xyz 1

c, N/A File is not correct

Use tolower() in other places or remove it as appropriate if your $2 data really can be upper or lower case or if that's just a mistake in your example and depending on if you want that treated as an error or not.

The output will be in random order courtesy of the in operator - that's easily changed to any other order if you care.

#!/bin/sh

FILE=1.txt

for r in `cat $FILE | sort | uniq`; do
count=`grep "$r" "$FILE" | wc -l | sed -e 's/^ *//'`
echo "$r $count";
done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM