简体   繁体   English

使用 openpyxl 在 Pandas 中读取 xlsx

[英]Read in xlsx in Pandas with openpyxl

From what I've read online, Pandas read_excel function has removed support for xlsx files but it's supposed to be easy to read them in but just using the openpyxl engine.根据我在网上阅读的内容,Pandas read_excel function 已删除对 xlsx 文件的支持,但应该很容易阅读它们,但只需使用 openpyxl 引擎。

When I run the following I get an error that says "unexpected keyword argument synchVertical " Here's my code:当我运行以下命令时,我收到一条错误消息,上面写着“意外的关键字参数synchVertical ” 这是我的代码:

pd.read_excel( path.join(data_dir,"opto_data.xlsx"), engine = 'openpyxl' )

And here are the dependencies I have installed...这是我安装的依赖项...

pandas-1.2.4
openpyxl-3.0.7

I just realized it might be the new version of vs-code that broke it我刚刚意识到可能是新版本的 vs-code 破坏了它

Try this尝试这个

X = pd.ExcelFile("filename.xlsx")
df = X.parse("sheet name here")

It works on both engine xlrd as well as openpyxl Also install xlrd for better experience它适用于引擎 xlrd 和 openpyxl 还安装 xlrd 以获得更好的体验

pip install xlrd

An.xlsx workbook is a zipped archive of xml files. An.xlsx 工作簿是 xml 文件的压缩存档。 According to the API reference the xml file for a worksheet can contain a 'worksheet properties' element ( <sheetPr> ) with an attribute syncVertical (without an h ).根据API 参考,工作表的 xml 文件可以包含带有属性syncVertical (没有h )的“工作表属性”元素( <sheetPr> )。 However opening a workbook with syncVertical in Excel causes an error, while synchVertical works fine.但是,在 Excel 中打开带有syncVertical的工作簿会导致错误,而synchVertical工作正常。 Other software seems to have followed Excel in creating workbooks with the 'wrong' spelling, while openpyxl only accepts syncVertical as per the specs.其他软件似乎遵循 Excel 创建具有“错误”拼写的工作簿,而openpyxl仅根据规范接受syncVertical

Hopefully openpyxl will follow other software in accepting the misspelt attribute.希望 openpyxl 将跟随其他软件接受错误拼写属性。 In the meantime a fix is to remove the attribute.与此同时,修复是删除该属性。 This can be done manually by opening the workbook in Excel and saving it again, which seems to remove the attribute.这可以通过打开 Excel 中的工作簿并再次保存来手动完成,这似乎删除了该属性。 Alternatively, we can adapt this answer to edit the zip archive.或者,我们可以调整此答案以编辑 zip 存档。 Unfortunately it is a bit slow as it has to read and write the whole archive just to remove this one attribute.不幸的是,它有点慢,因为它必须读取和写入整个档案才能删除这个属性。 As a hacky quick solution we use a simple find/replace to take out the unwanted property.作为一个 hacky 快速解决方案,我们使用简单的查找/替换来取出不需要的属性。 A better but slower solution would be to parse the xml files properly.更好但更慢的解决方案是正确解析 xml 文件。

import tempfile
from zipfile import ZipFile
import shutil
import os
from fnmatch import fnmatch

def change_in_zip(file_name, name_filter, change):
    tempdir = tempfile.mkdtemp()
    try:
        tempname = os.path.join(tempdir, 'new.zip')
        with ZipFile(file_name, 'r') as r, ZipFile(tempname, 'w') as w:
            for item in r.infolist():
                data = r.read(item.filename)
                if fnmatch(item.filename, name_filter):
                    data = change(data)
                w.writestr(item, data)
        shutil.move(tempname, file_name)
    finally:
        shutil.rmtree(tempdir)

change_in_zip("opto_data.xlsx", 
              name_filter='xl/worksheets/*.xml', # the problematic property is found in the worksheet xml files
              change=lambda d: d.replace(b' synchVertical="1"', b' '))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM