[英]python parallel compare 2 csv files
我嘗試比較2個csv文件,每個文件包含100000行和10列。 我運行此代碼可以正常工作,但是當我有8個內核時,它僅使用一個CPU線程。 我希望這段代碼使用所有cpu線程。 經過搜索,我發現了並行的想法。 但是當我嘗試在此python代碼中將並行應用於for循環時,它不起作用。 如何並行應用此代碼? 預先感謝您的幫助!
import csv
#read csv files
f1= file('host.csv','r')
f2= file('master.csv','r')
f3= file('results.csv','w')
c1=csv.reader(f1)
c2=csv.reader(f2)
next(c2, None)
c3=csv.writer(f3)
#for loop compare row in host csv file
master_list = list(c2)
for row in c1:
row=1
found = False
colA = str(row[0]) #protocol
colB = str(row[11])
colC = str(row[12])
colD = str(row[13])
colE = str(row[14])
#loop in each row of master csv file
for master_row in master_list:
results_row=row
colBf2 = str(master_row[4])
colCf2 = str(master_row[5])
colDf2 = str(master_row[6])
colEf2 = str(master_row[7])
colFf2 = str(master_row[3])
#check condition
if colA == 'icmp':
#sub condiontion
if colB == colBf2 and colD == colDf2:
results_row.append(colFf2)
found = True
break
row = row + 1
else:
if colB == colBf2 and colD == colDf2 and colE == colEf2:
results_row.append(colFf2)
found = True
break
row =row+1
if not found:
results_row.append('Not Match')
c3.writerow(results_row)
f1.close()
f2.close()
f3.close()
昂貴的任務是為每個主機行重新掃描主表的內部循環。 由於python執行協作多線程(您可以搜索“ python GIL”),一次只能運行一個線程,因此多個線程不會加快cpu綁定操作的速度。 您可以生成子流程,但隨后必須權衡將數據傳輸到工作流程的成本與速度的提高。
或者,優化您的代碼。 與其並行運行,不如索引主節點。 您可以交換昂貴的100000條記錄掃描來快速查找字典。
我colA
在代碼中添加with
子句以節省幾行,並且跳過了colA
等...(改為使用命名索引)來保持代碼較小。
import csv
# columns of interest
A, B, C, D, E, F = 0, 11, 12, 13, 14, 3
# read and index column F in master by (B,D) and (B,D,E), discarding
# duplicates for those keys
col_index = {}
with open('master.csv') as master:
next(master)
for row in csv.reader(master):
key = row[B], row[D]
if key not in col_index:
col_index[key] = row[F]
key = row[B], row[D], row[E]
if key not in col_index:
col_index[key] = row[F]
#read csv files
with open('host.csv') as f1, open('results.csv','w') as f3:
c1=csv.reader(f1)
c3=csv.writer(f3)
for row in c1:
if row[A] == "icmp":
indexer = (row[B], row[D])
else:
indexer = (row[B], row[D], row[E])
row.append(col_index.get(indexer, 'Not Match'))
c3.writerow(row)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.