[英]AWK count occurrences of column A based on uniqueness of column B
I have a file with several colummns and I want to count the occurrence of one column based on a second columns value being unique to the first column EX:我有一个包含几列的文件,我想根据第一列 EX 独有的第二列值来计算一列的出现:
column 10 column 15
orange New York
green New York
blue New York
gold New York
orange Amsterdam
blue New York
green New York
orange Sweden
blue Tokyo
gold New York
I am fairly new to using commands like awk and am looking to gain more practical knowledge.我对使用 awk 之类的命令相当陌生,并且希望获得更多实用知识。
i've tried some different variations of我尝试了一些不同的变体
awk '{A[$10 OFS $15]++} END {for (k in A) print k, A[k]}' myfile
but, not quite understanding the code, the output was not what I've expected.但是,不太了解代码,output 不是我所期望的。
I am expecting output of我期待 output
orange 3
blue 2
green 1
gold 1
With GNU awk.使用 GNU awk。 I assume tab is your field separator.
我假设制表符是您的字段分隔符。
awk '{count[$10 FS $15]++}END{for(j in count) print j}' FS='\t' file | cut -d $'\t' -f 1 | sort | uniq -c | sort -nr
Output: Output:
3 orange 2 blue 1 green 1 gold
I suppose it could be more elegant.我想它可以更优雅。
Single GNU awk
invocation version (Works with non-GNU awk too, just doesn't sort the output):单个 GNU
awk
调用版本(也适用于非 GNU awk,只是不对输出进行排序):
$ gawk 'BEGIN{ OFS=FS="\t" }
NR>1 { names[$2,$1]=$1 }
END { for (n in names) colors[names[n]]++;
PROCINFO["sorted_in"] = "@val_num_desc";
for (c in colors) print c, colors[c] }' input.tsv
orange 3
blue 2
gold 1
green 1
Adjust column numbers as needed to match real data.根据需要调整列号以匹配真实数据。
Bonus solution that uses sqlite3:使用 sqlite3 的奖励解决方案:
$ sqlite3 -batch -noheader <<EOF
.mode tabs
.import input.tsv names
SELECT "column 10", count(DISTINCT "column 15") AS total
FROM names
GROUP BY "column 10"
ORDER BY total DESC, "column 10";
EOF
orange 3
blue 2
gold 1
green 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.