简体   繁体   English

如何使用 engine='openpyxl' 使 pandas.read_excel 的行为与 xlrd 一样,默认情况下不显示纳秒?

[英]How to make pandas.read_excel with engine='openpyxl' behave like it did with xlrd, not showing nanoseconds by default?

We have a process that reads data in from an Excel.xlsx spreadsheet into a pandas DataFrame.我们有一个从 Excel.xlsx 电子表格中读取数据到 pandas DataFrame 的过程。 While trying to upgrade to the latest version (1.2.1) of pandas, I saw the following in the doc for pandas read_excel function for the engine argument:在尝试升级到 pandas 的最新版本(1.2.1)时,我在 pandas read_excel function 的文档中看到了以下内容: engine参数

  • “openpyxl” supports newer Excel file formats. “openpyxl”支持更新的 Excel 文件格式。

Changed in version 1.2.0: The engine xlrd now only supports old-style.xls files.在 1.2.0 版更改: 引擎 xlrd 现在只支持 old-style.xls 文件。

So, I added engine='openpyxl' to my read_excel function call and started to see strange, new behavior, whereby datetime values now were showing nanoseconds by default, which wasn't the case with xlrd .因此,我将engine='openpyxl'添加到我的read_excel function 调用中,并开始看到奇怪的新行为,默认情况下 datetime 值现在显示纳秒,而xlrd不是这种情况。 On top of that, I was seeing datetimes a bit off from the expected value seen in Excel by a few nanoseconds.最重要的是,我看到日期时间与 Excel 中看到的预期值相差几纳秒。 I saw the same thing with pandas 1.2.1 and also 1.1.4.我在 pandas 1.2.1 和 1.1.4 上看到了同样的情况。

For the following Excel data (the raw values show as 44098.0416666667 for the 9/24 date and 44083.6847222222 for both 9/9 dates)对于以下 Excel 数据(9/24 日期的原始值显示为 44098.0416666667,9/9 日期的原始值显示为 44083.6847222222)

在此处输入图像描述

I'm seeing the following behavior:我看到以下行为:

>>> import pandas as pd
>>> pd.read_excel('~/testDatetimeNanos.xlsx')
             TestDate
0 2020-09-24 01:00:00
1 2020-09-09 16:26:00
2 2020-09-09 16:26:00
>>> pd.read_excel('~/testDatetimeNanos.xlsx', engine='openpyxl')
                    TestDate
0 2020-09-24 01:00:00.000003
1 2020-09-09 16:25:59.999998
2 2020-09-09 16:26:00.000000

I'm wondering if there's a way to use the new openpyxl engine so that its behavior is consistent with the old xlrd engine...?我想知道是否有办法使用新的 openpyxl 引擎,使其行为与旧的 xlrd 引擎一致......?
Also, wondering if I may have stumbled onto a bug (update: submitted bug report ).另外,想知道我是否偶然发现了一个错误(更新:提交的错误报告)。

As of openpyxl ≥3.0.7, the bug has been fixed (Aug 2021)从 openpyxl ≥3.0.7 开始,bug 已经修复(2021 年 8 月)

Regardless of the engine and the version of openpyxl, you can simply remove the nanoseconds like this:无论引擎和 openpyxl 的版本如何,您都可以像这样简单地删除纳秒:

df['testDate'] = df['testDate'].str[:-7]

#              testDate
#0  2020-09-24 01:00:00
#1  2020-09-09 16:25:59
#2  2020-09-09 16:26:00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM