[英]How to sum rows of two or more csv files that have the same value in column 1?
I have two csv files which look like this:我有两个 csv 文件,如下所示:
csv1.csv: csv1.csv:
COL1 COL2
Daniel 120
Max 340
Sabrina 5
csv2.csv: csv2.csv:
COL1 COL2
Max 120
Sabrina 40
Daniel 50
Sarah 580
And I basically want to merge them so it looks like this:我基本上想合并它们,所以它看起来像这样:
COL1 COL2
Sarah 580
Max 460
Daniel 170
Sabrina 45
It it possible to achieve this in python?有可能在 python 中实现这一点吗?
I only found similar questions regarding 1 csv file, so help would be greatly appreciated.我只发现关于 1 csv 文件的类似问题,因此将不胜感激。
You can try merge
.你可以试试merge
。 df1
is the DataFrame from csv1
and df2
is the DataFrame from csv2
df1
是 csv1 的csv1
和df2
是 csv2 的csv2
import pandas as pd # pip install pandas
# setting up the dataframe from you example
d1 = [['Daniel' , 120],
['Max' , 340],
['Sabrina' , 5]]
df1 = pd.DataFrame(d1, columns=['col1', 'col2'])
d2 = [['Max' , 120],
['Sabrina' , 40],
['Daniel' , 50],
['Sarah' , 580]]
df2 = pd.DataFrame(d2, columns=['col1', 'col2'])
# here comes the part to calculate
df_out = df1.merge(df2, on='col1', how='outer').fillna(0)
df_out['col2'] = df_out['col2_x'] + df_out['col2_y']
# remove the unnecesary columns
df_out.drop(columns=['col2_x', 'col2_y'], inplace=True)
print(df_out)
col1 col2
0 Daniel 170
1 Max 460
2 Sabrina 45
3 Sarah 580
Add values in a dictionary, something like this:在字典中添加值,如下所示:
with open('csv1.csv') as f,open('csv2.csv') as f2:
r = csv.reader(f, delimiter=' ')
dict3 = {x[0]: x[1] for x in r}
r2 = csv.reader(f2, delimiter=' ')
for row in r2:
if 'COL' not in row[1]:
dict3[row[0]] = int(dict3[row[0]]) + int(row[1])
print(dict3)
now you just write dict3 in an output file.现在您只需在 output 文件中写入 dict3 即可。
with open('output.csv', 'w') as f3:
please = csv.writer(f3)
for k, v in dict3.items():
please.writerow([k + ' ' +str(v)])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.