简体   繁体   English

当字段在文件中匹配时,在csv中对多行进行求和

[英]sum multiple lines in csv when fields match within file

I have a file that I've trimmed down to look like the following: 我有一个文件,我已经修剪下来,如下所示:

"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

etc. 等等

What I want to end up with is a sum of the total amount per city. 我想最终得到的是每个城市的总金额的总和。 That is: 那是:

"Reno","220.00"
"Lakewood","150.00"
"Altamonte Springs","100.25"

Etc. 等等。

Fair warning, the data set is not necessarily continuous-that is, a city may appear once here, once a thousand lines down, and 3 more times at the end. 公平的警告,数据集不一定是连续的 - 也就是说,一个城市可能会出现一次,一次一千行,最后三次。

I've been trying to use the following awk script: 我一直在尝试使用以下awk脚本:

awk -F "," '{array[$1]+=$2} END { for (i in array) {print i"," array[i]}}' test1.csv > test6.csv

The results I'm getting look like this: 结果我看起来像这样:

"Matawan",0
"Bay Side",0
"Pataskala",0
"Dorothy",0
"Haymarket",0
"Myrtle Point",0

Etc. All zeros on the second column, and no quotes. 等等。第二列全部为零,没有引号。

I'm obviously missing something, but I don't know what or where else to look. 我显然错过了一些东西,但我不知道要看什么或其他什么。 What am I missing? 我错过了什么?

Thanks. 谢谢。

The reason you failed is because of the double quotes. 你失败的原因是因为双引号。

Do something like this: 做这样的事情:

sed 's/"//g' file.csv | awk -F "," '{array[$1]+=$2}END{for(i in array) {print "\""  i "\""  ","  "\"" array[i] "\"" }}' 

"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"

This awk one-liner would give exactly what you want with formatting : 这个awk单行将准确地给出您想要的格式:

awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' file

test with your data: 测试您的数据:

kent$  cat f
"Reno","40.00"
"Reno","40.00"
"Reno","80.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Altamonte Springs","25.00"
"Sandpoint","50.00"
"Lenoir City","987.00"

kent$  awk -F'","' '{a[$1]+=$2*1}END{for (x in a)printf "%s\",\"%.2f\"\n", x,a[x]}' f
"Lakewood","150.00"
"Reno","220.00"
"Lenoir City","987.00"
"Sandpoint","50.00"
"Altamonte Springs","100.25"

" is causing problem in your input. First remove them using sed and print it back using printf inside awk "导致输入问题。首先使用sed删除它们,然后使用awk printf将其打印回来

Try following: 试试以下:

sed 's/"//g' input.csv | awk -F "," '{array[$1]+=$2} END { for (i in array) {printf "\"%s\",\"%\"\n", i, array[i]}}' > output.csv

Jumbled Input 混乱的输入

"Reno","40.00"
"Reno","60.00"
"Lakewood","150.00"
"Altamonte Springs","50.25"
"Altamonte Springs","25.00"
"Reno","80.00"
"Sandpoint","50.00"
"Reno","40.00"
"Lenoir City","987.00"
"Altamonte Springs","25.00"

Output 产量

"Reno","220.00"
"Altamonte Springs","100.25"
"Lakewood","150.00"
"Lenoir City","987.00"
"Sandpoint","50.00"

You don't need pre-processing or nasty escaping: 您不需要预处理或讨厌的转义:

$ awk -F'"' '{a[$2]+=$4}END{for(k in a)printf "%s,%s\n",FS k FS,FS a[k] FS}' file
"Lenoir City","987"
"Reno","220"
"Lakewood","150"
"Sandpoint","50"
"Altamonte Springs","100.25"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM