[英]Handling very large files with openpyxl python
I have a spreadsheet with 11,000 rows and 10 columns. 我有一个包含11,000行和10列的电子表格。 I am trying to copy each row with selected columns, add additional information per line and output to a txt.
我正在尝试复制具有选定列的每一行,每行添加其他信息,然后输出到txt。
Unfortunately, I am having really bad performance issues, files start to slug after 100 rows and kill my processor. 不幸的是,我遇到了非常糟糕的性能问题,文件在经过100行之后开始塞住并杀死了我的处理器。 Is there a way to speed this up or use better methodology?
有没有办法加快速度或使用更好的方法? I am already using
read_only=True
and data_only=True
我已经在使用
read_only=True
和data_only=True
Most memory intensive part is iterating through each cell : 占用大量内存的部分是遍历每个单元:
for i in range(probeStart, lastRow+1):
dataRow =""
for j in range (1,col+2):
dataRow = dataRow + str(sheet.cell(row=i, column=j).value) + "\t"
sigP = db.get(str(sheet.cell(row= i, column=1).value), "notfound") #my additional information
a = str(sheet.cell(row = i, column = max_column-1).value) +"\t"
b = str(sheet.cell(row = i, column = max_column).value) + "\t"
string1 = dataRow + a + b + sigP + "\n"
w.write(string1)
Question : Is there a way to speed this up or use better methodology?
问题 :是否可以加快速度或使用更好的方法?
Try the following to see if this improve performance: 请尝试以下操作,看是否可以提高性能:
Note : Didn't know the Values of
col
andmax_column
!注意 :不知道
col
和max_column
的值!
My Example uses 4 Columns and skips Column C.我的示例使用4列,并跳过列C。
Data :
资料 :
['A1', 'B1', 'C1', 'D1'],['A1','B1','C1','D1'],
['A2', 'B2', 'C2', 'D2']['A2','B2','C2','D2']
from openpyxl.utils import range_boundaries
min_col, min_row, max_col, max_row = range_boundaries('A1:D2')
for row_cells in ws.iter_rows(min_col=min_col, min_row=min_row,
max_col=max_col, max_row=max_row):
# Slice Column Values up to B
data = [cell.value for cell in row_cells[:2]]
# Extend List with sliced Column Values from D up to End
data.extend([cell.value for cell in row_cells[3:]])
# Append db.get(Column A.value)
data.append(db.get(row_cells[0].value, "notfound"))
# Join all List Values delimited with \t
print('{}'.format('\t'.join(data)))
# Write to CSV
#w.write(data)
Output :
输出 :
A1 B1 D1 notfound找不到A1 B1 D1
A2 B2 D2 notfound找不到A2 B2 D2
Tested with Python: 3.4.2 - openpyxl: 2.4.1 使用Python测试:3.4.2-openpyxl:2.4.1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.