简体   繁体   中英

How to count number of hotels in every county using awk?

I have a dataset hotels.csv with columns: doc_id, hotel_name, hotel_url, street, city, state, country, zip, class, price, num_reviews, CLEANLINESS, ROOM, SERVICE, LOCATION, VALUE, COMFORT, overall_ratingsource

And I want to count amount of hotels in every country. How can I do it using awk? I can count amount of hotels for China or USA:

cat /home/data/hotels.csv | awk -F, '$7=="China"{n+=1} END {print n}'

But how to do it for every country?

Parsing CSV with awk is usually not a good idea. If some of your fields contain commas, for instance, it will not work as expected. Anyway, associative arrays are usually convenient for this kind of tasks:

awk -F, '{num[$7]++} END{for(country in num) print country, num[country]}' /home/data/hotels.csv

Note: cat file | awk ... cat file | awk ... is useless. Simply pass the file to awk.

If you have the columns as the first row, you can start processing the data starting from the second row, use the name of the country as the array key and increment the value when encountering the same key.

awk -F, 'NR > 1 {
    ary[$7]++
} 
END {
    for(item in ary) print item, ary[item]
}
' /home/data/hotels.csv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM