How to count number of hotels in every county using awk?

Question

I have a dataset hotels.csv with columns: doc_id, hotel_name, hotel_url, street, city, state, country, zip, class, price, num_reviews, CLEANLINESS, ROOM, SERVICE, LOCATION, VALUE, COMFORT, overall_ratingsource

And I want to count amount of hotels in every country. How can I do it using awk? I can count amount of hotels for China or USA:

cat /home/data/hotels.csv | awk -F, '$7=="China"{n+=1} END {print n}'

But how to do it for every country?

Answer 1

Parsing CSV with awk is usually not a good idea. If some of your fields contain commas, for instance, it will not work as expected. Anyway, associative arrays are usually convenient for this kind of tasks:

awk -F, '{num[$7]++} END{for(country in num) print country, num[country]}' /home/data/hotels.csv

Note: cat file | awk ... cat file | awk ... is useless. Simply pass the file to awk.

Answer 2

If you have the columns as the first row, you can start processing the data starting from the second row, use the name of the country as the array key and increment the value when encountering the same key.

awk -F, 'NR > 1 {
    ary[$7]++
} 
END {
    for(item in ary) print item, ary[item]
}
' /home/data/hotels.csv

How to count number of hotels in every county using awk?

Question

2 answers

solution1
2 ACCPTED 2021-10-29 11:28:51

solution2
1 2021-10-29 11:42:06

How to count number of hotels in every county using awk?

Question

2 answers

solution1 2 ACCPTED 2021-10-29 11:28:51

solution2 1 2021-10-29 11:42:06

solution1
2 ACCPTED 2021-10-29 11:28:51

solution2
1 2021-10-29 11:42:06