如何使用 engine='openpyxl' 使 pandas.read_excel 的行为与 xlrd 一样，默认情况下不显示纳秒？

Question

We have a process that reads data in from an Excel.xlsx spreadsheet into a pandas DataFrame.我们有一个从 Excel.xlsx 电子表格中读取数据到 pandas DataFrame 的过程。 While trying to upgrade to the latest version (1.2.1) of pandas, I saw the following in the doc for pandas read_excel function for the engine argument:在尝试升级到 pandas 的最新版本（1.2.1）时，我在 pandas read_excel function 的文档中看到了以下内容： engine参数

“openpyxl” supports newer Excel file formats. “openpyxl”支持更新的 Excel 文件格式。

Changed in version 1.2.0: The engine xlrd now only supports old-style.xls files.在 1.2.0 版更改: 引擎 xlrd 现在只支持 old-style.xls 文件。

So, I added engine='openpyxl' to my read_excel function call and started to see strange, new behavior, whereby datetime values now were showing nanoseconds by default, which wasn't the case with xlrd .因此，我将engine='openpyxl'添加到我的read_excel function 调用中，并开始看到奇怪的新行为，默认情况下 datetime 值现在显示纳秒，而xlrd不是这种情况。 On top of that, I was seeing datetimes a bit off from the expected value seen in Excel by a few nanoseconds.最重要的是，我看到日期时间与 Excel 中看到的预期值相差几纳秒。 I saw the same thing with pandas 1.2.1 and also 1.1.4.我在 pandas 1.2.1 和 1.1.4 上看到了同样的情况。

For the following Excel data (the raw values show as 44098.0416666667 for the 9/24 date and 44083.6847222222 for both 9/9 dates)对于以下 Excel 数据（9/24 日期的原始值显示为 44098.0416666667，9/9 日期的原始值显示为 44083.6847222222）

I'm seeing the following behavior:我看到以下行为：

>>> import pandas as pd
>>> pd.read_excel('~/testDatetimeNanos.xlsx')
             TestDate
0 2020-09-24 01:00:00
1 2020-09-09 16:26:00
2 2020-09-09 16:26:00
>>> pd.read_excel('~/testDatetimeNanos.xlsx', engine='openpyxl')
                    TestDate
0 2020-09-24 01:00:00.000003
1 2020-09-09 16:25:59.999998
2 2020-09-09 16:26:00.000000

I'm wondering if there's a way to use the new openpyxl engine so that its behavior is consistent with the old xlrd engine...?我想知道是否有办法使用新的 openpyxl 引擎，使其行为与旧的 xlrd 引擎一致......？
Also, wondering if I may have stumbled onto a bug (update: submitted bug report ).另外，想知道我是否偶然发现了一个错误（更新：提交的错误报告）。

As of openpyxl ≥3.0.7, the bug has been fixed (Aug 2021)从 openpyxl ≥3.0.7 开始，bug 已经修复（2021 年 8 月）

Answer 1

Regardless of the engine and the version of openpyxl, you can simply remove the nanoseconds like this:无论引擎和 openpyxl 的版本如何，您都可以像这样简单地删除纳秒：

df['testDate'] = df['testDate'].str[:-7]

#              testDate
#0  2020-09-24 01:00:00
#1  2020-09-09 16:25:59
#2  2020-09-09 16:26:00

如何使用 engine='openpyxl' 使 pandas.read_excel 的行为与 xlrd 一样，默认情况下不显示纳秒？

问题描述

1 个解决方案

解决方案1
0 2022-01-26 16:21:37

如何使用 engine='openpyxl' 使 pandas.read_excel 的行为与 xlrd 一样，默认情况下不显示纳秒？

问题描述

1 个解决方案

解决方案1 0 2022-01-26 16:21:37

解决方案1
0 2022-01-26 16:21:37