简体   繁体   English

将 Pandas 数据框导出到 Excel 多表文件的正确方法是什么?

[英]What is the right way to export Pandas dataframe to Excel multi-sheet file?

I need to output two cleaned and recalculated dataframes to Excel file as separate sheets.我需要将两个经过清理和重新计算的数据帧作为单独的工作表输出到 Excel 文件中。 This code works, but opening resulting file in Excel produces "file corrupted" - it gets repaired and opens fine afterwards, but this is annoying.这段代码有效,但在 Excel 中打开生成的文件会产生“文件损坏”——它被修复并在之后打开正常,但这很烦人。

The code is on Azure Jupiter Notebook, Python 3.6, I download Excel file and open in Excel 365, Win 10.代码在 Azure Jupiter Notebook、Python 3.6 上,我下载 Excel 文件并在 Excel 365、Win 10 中打开。

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('PR_weatherGDDid.xlsx', engine='xlsxwriter') 

# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name='Daily', index=False)     
doystats.to_excel(writer, sheet_name='stats')    

# Close the Pandas Excel writer and output the Excel file.
writer.save()

So: Excel file gets created but has a problem to be opened in Excel.所以:Excel 文件已创建,但在 Excel 中打开时出现问题。

Here is the correct way.这是正确的方法。

>>> with pd.ExcelWriter('PR_weatherGDDid.xlsx') as writer: 
...     df.to_excel(writer, sheet_name='Daily')
...     doystats.to_excel(writer, sheet_name='stats')

This is my code and I can open the Excell file allright:这是我的代码,我可以正常打开 Excell 文件:

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('PR_weatherGDDid.xlsx') 

data = [['AMN987','Ok'],['AMN987','Ok'],['AMN987','Error'], ['BBB987','Ok'],['BBB987','Ok'],['CCC','Error']]
df = pd.DataFrame(data, columns=['Serial', 'Status'])

days_to = [['02/08/19',4],['02/08/19',8],['02/08/19',3], ['02/08/19',6],['02/08/19',0],['02/08/19',9]]
doystats = pd.DataFrame(days_to, columns=['Date', 'Day'])

# Write each dataframe to a different worksheet.
df.to_excel(writer, sheet_name='Daily', index=False)     
doystats.to_excel(writer, sheet_name='stats')    

# Close the Pandas Excel writer and output the Excel file.
writer.save()
writer.close()

The output looks like this:输出如下所示:

在此处输入图片说明 在此处输入图片说明

The problem with Excel only opening the created file after "repairs" seems to stem from the fact that file was created in Azure Jupiter notebook online. Excel 只在“修复”后打开创建的文件的问题似乎源于文件是在 Azure Jupiter notebook 在线创建的。 All 3 code variants (mine and suggested by @atlas and @sharif) produced file needing "repairs" in the online environment, but made normal Excel file when I run it through local-installed Jupiter Notebooks (Anaconda).所有 3 个代码变体(我的并由 @atlas 和 @sharif 建议)在在线环境中生成了需要“修复”的文件,但是当我通过本地安装的 Jupiter Notebooks (Anaconda) 运行它时生成了普通的 Excel 文件。

As Larisa Golovko noted, this appears to be an issue only with XlsxWriter on Azure Notebooks.正如 Larisa Golovko 指出的那样,这似乎只是 Azure Notebooks 上的 XlsxWriter 的问题。 It doesn't happen with XlsxWriter, Pandas or Jupyter in offline environments.离线环境中的 XlsxWriter、Pandas 或 Jupyter 不会发生这种情况。

I dug into it a bit more here and it looks like it there is a zipfile compression error on the .rels files in the xlsx archive.在这里深入研究了一点,看起来 xlsx 存档中的 .rels 文件存在 zipfile 压缩错误。 Currently I don't know what is causing that but it appears to be related to the standard Python zipfile library on that environment.目前我不知道是什么原因造成的,但它似乎与该环境中的标准 Python zipfile 库有关。 I'll try to put together a simpler test case without XlsxWriter.我将尝试在没有 XlsxWriter 的情况下组合一个更简单的测试用例。

A workaround is to use the XlsxWriter in_memory constructor option:解决方法是使用 XlsxWriter in_memory构造函数选项:

workbook = xlsxwriter.Workbook('hello_world.xlsx', {'in_memory': True})

# Or:

writer = pd.ExcelWriter('pandas_example.xlsx',
                        engine='xlsxwriter',
                        options={'in_memory': True})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM