[英]How to make pandas.read_excel with engine='openpyxl' behave like it did with xlrd, not showing nanoseconds by default?
We have a process that reads data in from an Excel.xlsx spreadsheet into a pandas DataFrame.我们有一个从 Excel.xlsx 电子表格中读取数据到 pandas DataFrame 的过程。 While trying to upgrade to the latest version (1.2.1) of pandas, I saw the following in the doc for pandas
read_excel
function for the engine
argument:在尝试升级到 pandas 的最新版本(1.2.1)时,我在 pandas
read_excel
function 的文档中看到了以下内容: engine
参数
- “openpyxl” supports newer Excel file formats.
“openpyxl”支持更新的 Excel 文件格式。
Changed in version 1.2.0: The engine xlrd now only supports old-style.xls files.
在 1.2.0 版更改: 引擎 xlrd 现在只支持 old-style.xls 文件。
So, I added engine='openpyxl'
to my read_excel
function call and started to see strange, new behavior, whereby datetime values now were showing nanoseconds by default, which wasn't the case with xlrd
.因此,我将
engine='openpyxl'
添加到我的read_excel
function 调用中,并开始看到奇怪的新行为,默认情况下 datetime 值现在显示纳秒,而xlrd
不是这种情况。 On top of that, I was seeing datetimes a bit off from the expected value seen in Excel by a few nanoseconds.最重要的是,我看到日期时间与 Excel 中看到的预期值相差几纳秒。 I saw the same thing with pandas 1.2.1 and also 1.1.4.
我在 pandas 1.2.1 和 1.1.4 上看到了同样的情况。
For the following Excel data (the raw values show as 44098.0416666667 for the 9/24 date and 44083.6847222222 for both 9/9 dates)对于以下 Excel 数据(9/24 日期的原始值显示为 44098.0416666667,9/9 日期的原始值显示为 44083.6847222222)
I'm seeing the following behavior:我看到以下行为:
>>> import pandas as pd
>>> pd.read_excel('~/testDatetimeNanos.xlsx')
TestDate
0 2020-09-24 01:00:00
1 2020-09-09 16:26:00
2 2020-09-09 16:26:00
>>> pd.read_excel('~/testDatetimeNanos.xlsx', engine='openpyxl')
TestDate
0 2020-09-24 01:00:00.000003
1 2020-09-09 16:25:59.999998
2 2020-09-09 16:26:00.000000
I'm wondering if there's a way to use the new openpyxl engine so that its behavior is consistent with the old xlrd engine...?我想知道是否有办法使用新的 openpyxl 引擎,使其行为与旧的 xlrd 引擎一致......?
Also, wondering if I may have stumbled onto a bug (update: submitted bug report ).另外,想知道我是否偶然发现了一个错误(更新:提交的错误报告)。
As of openpyxl ≥3.0.7, the bug has been fixed (Aug 2021)从 openpyxl ≥3.0.7 开始,bug 已经修复(2021 年 8 月)
Regardless of the engine and the version of openpyxl, you can simply remove the nanoseconds like this:无论引擎和 openpyxl 的版本如何,您都可以像这样简单地删除纳秒:
df['testDate'] = df['testDate'].str[:-7]
# testDate
#0 2020-09-24 01:00:00
#1 2020-09-09 16:25:59
#2 2020-09-09 16:26:00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.