AWK count occurrences of column A based on uniqueness of column B

Question

I have a file with several colummns and I want to count the occurrence of one column based on a second columns value being unique to the first column EX:

column 10            column 15
orange               New York
green                New York
blue                 New York
gold                 New York
orange               Amsterdam
blue                 New York
green                New York
orange               Sweden
blue                 Tokyo
gold                 New York

I am fairly new to using commands like awk and am looking to gain more practical knowledge.

i've tried some different variations of

awk '{A[$10 OFS $15]++} END {for (k in A) print k, A[k]}' myfile

but, not quite understanding the code, the output was not what I've expected.

I am expecting output of

orange     3
blue       2
green      1
gold       1

Answer 1

With GNU awk. I assume tab is your field separator.

awk '{count[$10 FS $15]++}END{for(j in count) print j}' FS='\t' file | cut -d $'\t' -f 1 | sort | uniq -c | sort -nr

Output:

      3 orange
      2 blue
      1 green
      1 gold

I suppose it could be more elegant.

Answer 2

Single GNU awk invocation version (Works with non-GNU awk too, just doesn't sort the output):

$ gawk 'BEGIN{ OFS=FS="\t" }
        NR>1 { names[$2,$1]=$1 }
        END { for (n in names) colors[names[n]]++;
              PROCINFO["sorted_in"] = "@val_num_desc";
              for (c in colors) print c, colors[c] }' input.tsv
orange  3
blue    2
gold    1
green   1

Adjust column numbers as needed to match real data.

Bonus solution that uses sqlite3:

$ sqlite3 -batch -noheader <<EOF
.mode tabs
.import input.tsv names
SELECT "column 10", count(DISTINCT "column 15") AS total
FROM names
GROUP BY "column 10"
ORDER BY total DESC, "column 10";
EOF
orange  3
blue    2
gold    1
green   1

AWK count occurrences of column A based on uniqueness of column B

Question

2 answers

solution1
0 ACCPTED 2019-11-02 21:25:01

solution2
0 2019-11-03 09:44:25

AWK count occurrences of column A based on uniqueness of column B

Question

2 answers

solution1 0 ACCPTED 2019-11-02 21:25:01

solution2 0 2019-11-03 09:44:25

solution1
0 ACCPTED 2019-11-02 21:25:01

solution2
0 2019-11-03 09:44:25