[英]MemoryError while converting txt to xlsx
Related questions: 1. Error in converting txt to xlsx using python相关问题:1. 使用python将txt转换为xlsx时出错
My code is我的代码是
import csv
import openpyxl
import sys
def convert(input_path, output_path):
"""
Read a csv file (with no quoting), and save its contents in an excel file.
"""
wb = openpyxl.Workbook()
ws = wb.worksheets[0]
with open(input_path) as f:
reader = csv.reader(f, delimiter='\t', quoting=csv.QUOTE_NONE)
for row_index, row in enumerate(reader, 1):
for col_index, value in enumerate(row, 1):
ws.cell(row=row_index, column=col_index).value = value
print 'hello world'
wb.save(output_path)
print 'hello world2'
def main():
try:
input_path, output_path = sys.argv[1:]
except ValueError:
print 'Usage: python %s input_path output_path' % (sys.argv[0],)
else:
convert(input_path, output_path)
if __name__ == '__main__':
main()
This code works, except for some input files.此代码有效,但某些输入文件除外。 I couldn't find what the difference is between the input txt that causes this problem and input txt that doesn't.
我找不到导致此问题的输入 txt 和没有输入的 txt 之间的区别。
My first guess was encoding.我的第一个猜测是编码。 I tried changing the encoding of the input file to UTF-8 and UTF-8 with BOM.
我尝试使用 BOM 将输入文件的编码更改为 UTF-8 和 UTF-8。 But this failed.
但这失败了。
My second guess was it used literally too much memory.我的第二个猜测是它使用了太多的内存。 But my computer has SSD with 32 GB RAM.
但我的电脑有 32 GB RAM 的 SSD。
So perhaps this code is not fully utilizing the capacity of this RAM?那么也许这段代码没有充分利用这个 RAM 的容量?
How can I fix this?我怎样才能解决这个问题?
Edit: I added that line print 'hello world' and print 'hello world2' to check if all the parts before 'hello world' are run correctly.
编辑:我添加了那行 print 'hello world' 并打印 'hello world2' 以检查 'hello world' 之前的所有部分是否都正确运行。
I checked the code prints 'hello world', but not 'hello world2'我检查了代码打印“你好世界”,但不是“你好世界2”
So, it really seems likely that wb.save(output_path)所以,似乎 wb.save(output_path)
is causing the problem.导致问题。
openpyxl has optimised modes for reading and writing large files. openpyxl 优化了读取和写入大文件的模式。
wb = Workbook(write_only=True)
will enable this. wb = Workbook(write_only=True)
将启用此功能。
I'd also recommend that you install lxml for speed.我还建议您安装 lxml 以提高速度。 This is all covered in the documentation.
这一切都包含在文档中。
Below are three alternatives:下面是三个备选方案:
RANGE FOR LOOP循环范围
Possibly, the two enumerate()
calls may have a memory footprint as indexing must occur in a nested for loop.可能,两个
enumerate()
调用可能有内存占用,因为索引必须发生在嵌套的 for 循环中。 Consider passing csv.reader content into a list (being subscriptable) and use range()
.考虑将 csv.reader 内容传递到列表(可下标)并使用
range()
。 Though admittedly even this may not be efficient as starting in Python 3 each range()
call (compared to deprecated xrange
) generates its own list in memory as well.诚然,即使这可能效率不高,因为从 Python 3 开始,每个
range()
调用(与已弃用的xrange
相比)也在内存中生成自己的列表。
with open(input_path) as f:
reader = csv.reader(f)
row = []
for data in reader:
row.append(data)
for i in range(len(row)):
for j in range(len(row[0])):
ws.cell(row=i, column=j).value = row[i][j]
OPTIMIZED WRITER优化写入器
OpenPyXL even warns that scrolling through cells even without assigning values will retain them in memory. OpenPyXL甚至警告说,即使在没有赋值的情况下滚动单元格也会将它们保留在内存中。 As a solution, you can use the Optimized Writer using above
row
list produced from csv.reader.作为解决方案,您可以使用Optimized Writer使用上面从 csv.reader 生成的
row
列表。 This route appends entire rows in a write-only workbook instance:此路由将整行附加到只写工作簿实例中:
from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()
i = 0
for irow in row:
ws.append(['%s' % j for j in row[j]])
i += 1
wb.save('C:\Path\To\Outputfile.xlsx')
WIN32COM LIBRARY WIN32COM库
Finally, consider using the built-in win32com library where you open the csv in Excel and save as an xlsx or xls workbook .最后,考虑使用内置的 win32com 库,您可以在其中在 Excel 中打开 csv 并另存为xlsx 或 xls 工作簿。 Do note this package is only for Python Windows installations.
请注意,此软件包仅适用于 Python Windows 安装。
import win32com.client as win32
excel = win32.Dispatch('Excel.Application')
# OPEN CSV DIRECTLY INSIDE EXCEL
wb = excel.Workbooks.Open(input_path)
excel.Visible = False
outxl=r'C:\Path\To\Outputfile.xlsx'
# SAVE EXCEL AS xlOpenXMLWorkbook TYPE (51)
wb.SaveAs(outxl, FileFormat=51)
wb.Close(False)
excel.Quit()
Here are fews points you can consider:您可以考虑以下几点:
/tmp
folder, default folder where tmp files for created;/tmp
文件夹,创建tmp文件的默认文件夹;tmp
file path while creating workbook;tmp
文件路径; Below is my code:下面是我的代码:
#!/usr/bin/python
import os
import csv
import io
import sys
import traceback
from xlsxwriter.workbook import Workbook
fileNames=sys.argv[1]
try:
f=open(fileNames, mode='r')
workbook = Workbook(fileNames[:-4] + '.xlsx',{'in_memory': True})
worksheet = workbook.add_worksheet()
workbook.use_zip64()
rowCnt=0
#Create the bold style for the header row
for line in f:
rowCnt = rowCnt + 1
row = line.split("\001")
for j in range(len(row)):
worksheet.write(rowCnt, j, row[j].strip())
f.close()
workbook.close()
print ('success')
except ValueError:
print ('failure')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.