熊猫最大日期逐行？

Question

The solution to the question asked here unfortunately does not solve this problem.不幸的是，这里提出的问题的解决方案并没有解决这个问题。 I'm using Python 3.6.2我正在使用 Python 3.6.2

The Dataframe, df :数据框， df ：

                            date1                        date2
rec0    2017-05-25 14:02:23+00:00    2017-05-25 14:34:43+00:00
rec1                          NaT    2017-05-16 19:37:43+00:00

To reproduce the problem:要重现问题：

import psycopg2
import pandas as pd
Timestamp = pd.Timestamp
NaT = pd.NaT

df = pd.DataFrame({'date1': [Timestamp('2017-05-25 14:02:23'), NaT],
                   'date2': [Timestamp('2017-05-25 14:34:43'), Timestamp('2017-05-16 19:37:43')]})

tz = psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)
for col in ['date1', 'date2']:
    df[col] = pd.DatetimeIndex(df[col]).tz_localize(tz)
print(df.max(axis=1))

Both of the above columns have been converted using pd.to_datetime() to get the following column type: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]上述两列均已使用pd.to_datetime()转换为以下列类型： datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]

Running df.max(axis=1) doesn't give an error but certainly provides the incorrect solution.运行df.max(axis=1)不会出错，但肯定会提供不正确的解决方案。

Output (incorrect):输出（不正确）：

rec0   NaN
rec1   NaN
dtype: float64

The fix that I have in place is to apply a custom function to the df as written below:我的修复方法是apply自定义函数应用于 df ，如下所示：

def get_max(x):
    test = x.dropna()
    return max(test)
df.apply(get_max,axis=1)

Output (correct):输出（正确）：

rec0   2017-05-25 14:34:43+00:00
rec1   2017-05-16 19:37:43+00:00
dtype: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]

Maybe df.max() doesn't deal with date objects but only looks for floats ( docs ).也许df.max()不处理日期对象，而只查找浮点数（ docs ）。 Any idea why df.max(axis=1) only returns NaN ?知道为什么df.max(axis=1)只返回NaN吗？

Answer 1

After some testing, it looks like there is something wrong with pandas and psycopg2.tz.FixedOffsetTimezone .经过一些测试，看起来pandas和psycopg2.tz.FixedOffsetTimezone 。

If you try df.max(axis=0) it will work as expected, but as you indicate df.max(axis=1) will return a series of NaN .如果您尝试df.max(axis=0)它将按预期工作，但正如您所指出的df.max(axis=1)将返回一系列NaN 。 If you do not use psycopg2.tz.FixedOffsetTimezone as tz , df.max(axis=1) will return the expected result.如果您不使用psycopg2.tz.FixedOffsetTimezone作为tz ， df.max(axis=1)将返回预期结果。

Other manipulations will fail in this case, such as df.transpose .在这种情况下，其他操作将失败，例如df.transpose 。

Note that if you try df.values.max(axis=1) , you will get the expected result.请注意，如果您尝试df.values.max(axis=1) ，您将获得预期的结果。 So numpy.array seems to be able to deal with this.所以numpy.array似乎能够处理这个问题。 You should search in pandas Github issues ( like this one ) and maybe consider opening a new one if you can't find a fix.您应该在pandas Github 问题中搜索（例如这个），如果找不到修复程序，可以考虑打开一个新问题。

Another solution would be to drop psycopg2.tz.FixedOffsetTimezone , but you may have some reason to use this specifically.另一种解决方案是删除psycopg2.tz.FixedOffsetTimezone ，但您可能有一些理由专门使用它。

Answer 2

Using Pandas 1.0.5 with Python 3.8 I was still getting a series of Nans.在 Python 3.8 中使用 Pandas 1.0.5 我仍然得到一系列的 Nans。 Solved the issue by converting both columns to datetime and then adding skipna=True and numeric_only=False to the max() function:通过将两列转换为日期时间，然后将 skipna=True 和 numeric_only=False 添加到 max() 函数来解决该问题：

df['1'] = pd.to_datetime(df['1'], utc=True)
df['2'] = pd.to_datetime(df['2'], utc=True) 
df['3'] = df[['1', '2']].max(axis=1, skipna=True, numeric_only=False)

熊猫最大日期逐行？

问题描述

2 个解决方案

解决方案1
1 2017-09-17 21:55:36

解决方案2
0 2020-07-13 15:51:20

熊猫最大日期逐行？

问题描述

2 个解决方案

解决方案1 1 2017-09-17 21:55:36

解决方案2 0 2020-07-13 15:51:20

解决方案1
1 2017-09-17 21:55:36

解决方案2
0 2020-07-13 15:51:20