[英]How to combine two csv files together
I already looked at: How to combine 2 csv files with common column value, but both files have different number of lines and: Merging two CSV files using Python But both did not give the desired output I needed.
我有两个 csv 文件,其中包含以下数据:
第一个文件是data1.csv
Name Dept Company
John Smith candy lead
Diana Princ candy lead
Perry Plat wood lead
Jerry Springer clothes lead
Calvin Klein clothes lead
Lincoln Tun warehouse lead
Oliver Twist kitchen lead
第二个文件是data2.csv
Name Dept Company
John Smith candy lead
Tyler Perry candy lead
Perry Plat wood lead
Mary Poppins clothes lead
Calvin Klein clothes lead
Lincoln Tun warehouse lead
Herman Sherman kitchen lead
Jerry Springer clothes lead
Ivan Evans clothes lead
我想将它们合并为一个文件,称为newdata.csv
,将Dept
列分组并删除Company
列。 最终的 output 看起来像这样:
Name Dept
John Smith candy
Diana Princ candy
Tyler Perry candy
Perry Plat wood
Jerry Springer clothes
Calvin Klein clothes
Mary Poppins clothes
Ivan Evans clothes
Lincoln Tun warehouse
Oliver Twist kitchen
Herman Sherman kitchen
我尝试使用合并 function,但 output 不是我需要的。
到目前为止,这是我的代码:
import pandas as pd
import os, csv, sys
csvPath1 = 'data1.csv'
csvPath2 = 'data2.csv'
csvDest = 'newdata.csv'
df1 = pd.read_csv(csvPath1)
df2 = pd.read_csv(csvPath2)
df1=df1.drop('Company', 1)
df2=df2.drop('Company', 1)
merged = df1.merge(df2)
merged=merged.sort_values('Dept')
merged.to_csv(csvDest, index=False)
合并是 SQL 等效于连接。
您需要的 function 是 concat
merged = pd.concat([df1, df2], axis=0, ignore_index=True)
我最终找到了我自己问题的答案。 我做了一些挖掘,对我有用的是:
merged=df1.append(df2)
merged=merged.sort_values('Dept')
所以我的最终代码 output:
import pandas as pd
import os, csv, sys
csvPath1 = 'data1.csv'
csvPath2 = 'data2.csv'
csvDest = 'newdata.csv'
df1 = pd.read_csv(csvPath1)
df2 = pd.read_csv(csvPath2)
df1=df1.drop('Company', 1)
df2=df2.drop('Company', 1)
merged=df1.append(df2)
merged=merged.sort_values('Dept')
merged.to_csv(csvDest, index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.