[英]Merging one-by-many CSV files in Python
我有.csv文件形式的一系列隨機模擬的輸出,看起來像這樣:
Run,ID,Var
1,1,7
1,2,9
1,3,4
2,1,3
2,2,4
2,3,8
等等
除此之外,我有另一個數據文件,也是.csv,格式如下:
ID, Var2, Var3
1,0.89,0.10
2,0.45,0.98
3,0.27,0.05
4,0.98,0.24
注意 :數據文件中有一些值未出現在模擬文件中。 我希望這些被忽略。
我想要做的是編寫一個腳本,從第一個.csv文件中獲取每個值ID
,並找到Var2和Var3並將它們放在一起,最終得到如下結果:
Run, ID, Var, Var2, Var3
1,1,7,0.89,0.10
1,2,9,0.45,0.98
1,3,4,0.27,0.05
2,1,3,0.89,0.10
2,2,4,0.45,0.98
2,3,8,0.27,0.05
有關這種方法的任何建議嗎? 我承認這是我對Python中數據處理理解的極限。 我對在SAS中如何做到這一點有了一個公平的認識,但我更喜歡將它保持為單語言任務,以便將它們作為單個腳本進行處理。
ouput.csv:
Run, ID, Var
1, 1, 7
1, 2, 9
1, 3, 4
2, 1, 3
2, 2, 4
2, 3, 8
data.csv:
ID, Var2, Var3
1, 0.89, 0.10
2, 0.45, 0.98
3, 0.27, 0.05
8, 0.4, 0.5
請注意 ,即使我們在data.csv中有條目,而不是在ouput.csv中,它也不會影響最終結果,因為在我們解析output.csv時,我們只查找我們從output.csv知道的ID, 盡管相反不是真的,在 minimun的data.csv必須包含output.csv中的所有ID,但如果需要,可以很容易地處理。
碼:
import csv
from pprint import pprint
data = dict([(row['ID'], row) for row in csv.DictReader(open('data.csv', 'rb'), skipinitialspace = True)])
values = []
for row in csv.DictReader(open('output.csv', 'rb'), skipinitialspace = True):
values.append(row)
values[-1].update(data[row['ID']])
>>> pprint(values)
[{'ID': '1', 'Run': '1', 'Var': '7', 'Var2': '0.89', 'Var3': '0.10'},
{'ID': '2', 'Run': '1', 'Var': '9', 'Var2': '0.45', 'Var3': '0.98'},
{'ID': '3', 'Run': '1', 'Var': '4', 'Var2': '0.27', 'Var3': '0.05'},
{'ID': '1', 'Run': '2', 'Var': '3', 'Var2': '0.89', 'Var3': '0.10'},
{'ID': '2', 'Run': '2', 'Var': '4', 'Var2': '0.45', 'Var3': '0.98'},
{'ID': '3', 'Run': '2', 'Var': '8', 'Var2': '0.27', 'Var3': '0.05'}]
>>>
現在要保存回csv文件。
fieldnames = ['Run', 'ID', 'Var', 'Var2', 'Var3']
f = open('combined.csv', 'wb')
csvwriter = csv.DictWriter(f, fieldnames = fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames)) # 2.7 has writeheader, which is cleaner
[csvwriter.writerow(row) for row in values]
f.close()
$ cat combined.csv
Run,ID,Var,Var2,Var3
1,1,7,0.89,0.10
1,2,9,0.45,0.98
1,3,4,0.27,0.05
2,1,3,0.89,0.10
2,2,4,0.45,0.98
2,3,8,0.27,0.05
我希望這有幫助。
不使用csv
模塊的解決方案:
with open('data.txt') as f1,open('data1.txt') as f2,open('data3.txt','w') as f3:
header1=f1.readline().strip().split(',') #header from file 1 i.e Run,ID,Var
header2=f2.readline().strip().split(',')[1:] #header from file 2 ,i.e Var2, Var3
dic={x.strip().split(',')[0]:x.strip().split(',')[1:] for x in f2 if x.strip()} #use dict to save data as per ID from file 2
f3.write(','.join((header1+header2))+'\n') #write the new header(header1+header2) to file 3
for x in f1:
f3.write(x.strip()+','+','.join(dic[x.split(',')[1]])+'\n') #fetch results from dic as per the ID obtained from the current line in data.txt
輸出: data3.txt
包含
Run,ID,Var, Var2, Var3
1,1,7,0.89,0.10
1,2,9,0.45,0.98
1,3,4,0.27,0.05
2,1,3,0.89,0.10
2,2,4,0.45,0.98
2,3,8,0.27,0.05
簡單易用:
f = open('one.csv', 'r')
one = f.read()
f.close()
f = open('two.csv', 'r')
two = f.read()
f.close()
one = one.split('\n')[1:-1]
two = two.split('\n')[1:-1]
output = 'Run, ID, Var, Var2, Var3\n'
for o in one:
for t in two:
row = t.split(',')
if o.split(',')[1] == row[0]:
output += '%s,%s,%s\n' % (o, row[1], row[2])
# or save it to a file
print output
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.