[英]Merging two csv files with a common column but uneven lengths
我有兩個csv文件:csv文件1包含以下內容:
California,C1,G1,K1,Dine-In,B,25
California,C2,G2,K1,Dine-In,A,8
Hawaii,H1,J1,L1,Dine-In,A,22
Hawaii,H2,J2,L2,Dine-In,A,20
CSV文件2包含:
Hawaii,10
California,20
我希望輸出為:
California,C1,G1,K1,Dine-In,B,25,20
California,C2,G2,K1,Dine-In,A,8,20
Hawaii,H1,J1,L1,Dine-In,A,22,10
Hawaii,H2,J2,L2,Dine-In,A,20,10
我已經完成了代碼:
with open(r'file 1.csv', 'r') as f:
r = csv.reader(f)
dict2 = {row[0]: row[1:] for row in r}
with open(r'file 2.csv','r') as f:
r = csv.reader(f)
dict1 = OrderedDict((row[0], row[1:]) for row in r)
result = OrderedDict()
for d in (dict1, dict2):
for key, value in d.iteritems():
result.setdefault(key, []).extend(value)
with open('combined data.csv', 'wb') as f:
w = csv.writer(f)
for key, value in result.iteritems():
w.writerow([key] + value)
但是它給我的輸出是:
California,C1,G1,K1,Dine-In,B,25
California,C2,G2,K1,Dine-In,A,8
Hawaii,H1,J1,L1,Dine-In,A,22
Hawaii,H2,J2,L2,Dine-In,A,20
Hawaii,10
California,20
有任何想法嗎?
您只需要將file 2.csv
作為字典加載,然后在讀取file 1.csv
時將其附加到每一行,如下所示:
import csv
with open(r'file 2.csv','rb') as f_file2:
dict2 = {row[0]: row[1:] for row in csv.reader(f_file2)}
with open(r'file 1.csv', 'rb') as f_file1, open('combined data.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
for row in csv.reader(f_file1):
csv_output.writerow(row + dict2[row[0]])
給你:
California,C1,G1,K1,Dine-In,B,25,20
California,C2,G2,K1,Dine-In,A,8,20
Hawaii,H1,J1,L1,Dine-In,A,22,10
Hawaii,H2,J2,L2,Dine-In,A,20,10
import pandas pd
df1 = pd.read_csv('file1.csv', header=None)
df2 = pd.read_csv('file2.csv', header=None)
res = pd.merge(df1, df2, on=0)
res.to_csv('combined.csv', header=None, index=False)
combined.csv
:
California,C1,G1,K1,Dine-In,B,25,20
California,C2,G2,K1,Dine-In,A,8,20
Hawaii,H1,J1,L1,Dine-In,A,22,10
Hawaii,H2,J2,L2,Dine-In,A,20,10
將第一個文件讀入數據框:
df1 = pd.read_csv('file1.csv', header=None)
看起來像這樣:
0 1 2 3 4 5 6
0 California C1 G1 K1 Dine-In B 25
1 California C2 G2 K1 Dine-In A 8
2 Hawaii H1 J1 L1 Dine-In A 22
3 Hawaii H2 J2 L2 Dine-In A 20
對第二個文件執行相同的操作:
df2 = pd.read_csv('file2.csv', header=None)
結果是:
0 1
0 Hawaii 10
1 California 20
在第0
列合並:
res = pd.merge(df1, df2, on=0)
現在, res
看起來像這樣:
0 1_x 2 3 4 5 6 1_y
0 California C1 G1 K1 Dine-In B 25 20
1 California C2 G2 K1 Dine-In A 8 20
2 Hawaii H1 J1 L1 Dine-In A 22 10
3 Hawaii H2 J2 L2 Dine-In A 20 10
最后,寫入沒有標題和索引的csv文件:
res.to_csv('combined.csv', header=None, index=False)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.