Python：使用Openpyxl读取大型Excel工作表

Question

我有一个包含大约400个工作表的Excel文件，其中375个我需要保存为CSV文件。 我已经尝试过VBA解决方案，但Excel只是打开这个工作簿时遇到了问题。

我已经创建了一个python脚本来做到这一点。 但是，它会快速消耗所有可用内存，并且在导出25张后几乎停止工作。 有没有人建议我如何改进这段代码？

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", data_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

    if num.find(substring) > -1:
        sheet=importedfile.get_sheet_by_name(num)        
        name = "C:/Users/User/Desktop/Test/" + num + ".csv"
        with open(name, 'w', newline='') as file:
            savefile = csv.writer(file)
            for i in sheet.rows:
                savefile.writerow([cell.value for cell in i])
        file.close()
print(time.ctime())

任何帮助，将不胜感激。

谢谢

编辑：我正在使用Windows 7和python 3.4.3。 我也对R，VBA或SPSS的解决方案持开放态度。

Answer 1

尝试对load_workbook()类使用read_only=True属性，这会导致您获得的工作表为IterableWorksheet ，这意味着您只能迭代它们：您不能直接使用列/行号来访问其中的单元格值。 根据文档，这将提供near constant memory consumption 。

此外，您不需要关闭该file ; with语句会为你处理。

示例 -

import openpyxl

import csv

import time

print(time.ctime())

importedfile = openpyxl.load_workbook(filename = "C:/Users/User/Desktop/Giant Workbook.xlsm", read_only = True, keep_vba = False)

tabnames = importedfile.get_sheet_names()

substring = "Keyword"

for num in tabnames:

    if num.find(substring) > -1:
        sheet=importedfile.get_sheet_by_name(num)        
        name = "C:/Users/User/Desktop/Test/" + num + ".csv"
        with open(name, 'w', newline='') as file:
            savefile = csv.writer(file)
            for i in sheet.rows:
                savefile.writerow([cell.value for cell in i])
print(time.ctime())

来自文档 -

有时，您需要打开或写入非常大的XLSX文件，而openpyxl中的常见例程将无法处理该负载。 幸运的是，有两种模式可以让您在（接近）恒定内存消耗的情况下读取和写入无限量的数据。

Python：使用Openpyxl读取大型Excel工作表

问题描述

1 个解决方案

解决方案1
6 已采纳 2015-07-02 16:29:44

Python：使用Openpyxl读取大型Excel工作表

问题描述

1 个解决方案

解决方案1 6 已采纳 2015-07-02 16:29:44

解决方案1
6 已采纳 2015-07-02 16:29:44