简体   繁体   中英

AWK, Create array of data in BEGIN {} block

I have a spreadsheet in which each column represents a day of the week. Each cell in the column holds the string value of an animal on the farm that was fed that day. Like this:

Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
cow, cow, cow, cow, cow, cow, cow,
goat, goat, goat, goat, goat, goat, 
horse, horse, , horse, horse, horse, horse
 , pig, , , pig, , ,
duck, duck, duck, duck, duck, goose, duck
 , , , , , , goat

Notice that the cow was fed every day, the goat was fed every day but it was recorded on two disjointed rows, the horse was not fed on Wednesday, the pig was only fed on Tuesday and Friday, and instead of feeding the duck on Saturday, they fed the goose instead but recorded it on the duck line.

What I want to do now is construct an AWK script that will tell me which animals were fed every day of the week.

What I think I want to do is loop through the data once, and make an associative array of every unique value in field $7, the idea being that if an animal wasn't fed on Sunday, it wasn't fed every day of the week.

Then, I want to loop through the file again, and increment the value of the array holding the value of the animal on each day it is found. I then want to print out the names of every animal that was fed every day.

Here is the pseudo-code I've got so far:

awk -F "," 'FNR > 1 BEGIN {
    [SOMEHOW MAGICALLY CONSTRUCT AN ARRAY HOLDING THE VALUES OF FIELD $7]
    }
    {
        for (i=1; i <= NR; i++) {
            if ($i in animals) {
                animals[$i]++
            }
            else {
                 animals[$i]=0
            }
         }
     }
     END {
         for (animal in animals) {
             if (animals[animal]==7) {
                 print $animal[animal]
             }
          }
     }
}

I know that AWK code is probably not correct on a lot of levels. But I've been bashing my head against this problem all day, despite having read O'Reilly's "sed & awk" book and referencing it and The Googles all day.

Any help would be greatly appreciated.

What I want to do now is construct an AWK script that will tell me which animals were fed every day of the week.

Only the goat and cow were fed every day:

$ awk -F'[[:space:]]*,[[:space:]]*' 'NR>1{for (i=1;i<=7;i++) if ($i) fed[$i]+=1} END{for (a in fed) if (fed[a]==7) print a}' farmdata
goat
cow

How it works

awk implicitly loops over each record (line) in the file. This script uses one array, called fed , to keep track of how many times each animal was fed.

  • -F'[[:space:]]*,[[:space:]]*'

    This sets the field separator to be a comma along with adjacent white space if any.

  • NR>1{for (i=1;i<=7;i++) if ($i) fed[$i]+=1}

    For every line after the first, loop over each field and add one to the count for the name in that field.

  • END{for (a in fed) if (fed[a]==7) print a}

    After we reach the end of the file, print out every animal that was fed seven times.

Multiple lines

For those who prefer their code spread over multiple lines:

awk -F'[[:space:]]*,[[:space:]]*' '
    NR>1{
        for (i=1;i<=7;i++) 
           if ($i) fed[$i]+=1
    }  

    END{
        for (a in fed) 
           if (fed[a]==7) print a
    }
    ' farmdata

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM