[英]Pandas max date by row?
The solution to the question asked here unfortunately does not solve this problem.不幸的是,这里提出的问题的解决方案并没有解决这个问题。 I'm using Python 3.6.2
我正在使用 Python 3.6.2
The Dataframe, df
:数据框,
df
:
date1 date2
rec0 2017-05-25 14:02:23+00:00 2017-05-25 14:34:43+00:00
rec1 NaT 2017-05-16 19:37:43+00:00
To reproduce the problem:要重现问题:
import psycopg2
import pandas as pd
Timestamp = pd.Timestamp
NaT = pd.NaT
df = pd.DataFrame({'date1': [Timestamp('2017-05-25 14:02:23'), NaT],
'date2': [Timestamp('2017-05-25 14:34:43'), Timestamp('2017-05-16 19:37:43')]})
tz = psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)
for col in ['date1', 'date2']:
df[col] = pd.DatetimeIndex(df[col]).tz_localize(tz)
print(df.max(axis=1))
Both of the above columns have been converted using pd.to_datetime()
to get the following column type: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]
上述两列均已使用
pd.to_datetime()
转换为以下列类型: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]
Running df.max(axis=1)
doesn't give an error but certainly provides the incorrect solution.运行
df.max(axis=1)
不会出错,但肯定会提供不正确的解决方案。
Output (incorrect):输出(不正确):
rec0 NaN
rec1 NaN
dtype: float64
The fix that I have in place is to apply
a custom function to the df as written below:我的修复方法是
apply
自定义函数应用于 df ,如下所示:
def get_max(x):
test = x.dropna()
return max(test)
df.apply(get_max,axis=1)
Output (correct):输出(正确):
rec0 2017-05-25 14:34:43+00:00
rec1 2017-05-16 19:37:43+00:00
dtype: datetime64[ns, psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)]
Maybe df.max()
doesn't deal with date objects but only looks for floats ( docs ).也许
df.max()
不处理日期对象,而只查找浮点数( docs )。 Any idea why df.max(axis=1)
only returns NaN
?知道为什么
df.max(axis=1)
只返回NaN
吗?
After some testing, it looks like there is something wrong with pandas
and psycopg2.tz.FixedOffsetTimezone
.经过一些测试,看起来
pandas
和psycopg2.tz.FixedOffsetTimezone
。
If you try df.max(axis=0)
it will work as expected, but as you indicate df.max(axis=1)
will return a series of NaN
.如果您尝试
df.max(axis=0)
它将按预期工作,但正如您所指出的df.max(axis=1)
将返回一系列NaN
。 If you do not use psycopg2.tz.FixedOffsetTimezone
as tz
, df.max(axis=1)
will return the expected result.如果您不使用
psycopg2.tz.FixedOffsetTimezone
作为tz
, df.max(axis=1)
将返回预期结果。
Other manipulations will fail in this case, such as df.transpose
.在这种情况下,其他操作将失败,例如
df.transpose
。
Note that if you try df.values.max(axis=1)
, you will get the expected result.请注意,如果您尝试
df.values.max(axis=1)
,您将获得预期的结果。 So numpy.array
seems to be able to deal with this.所以
numpy.array
似乎能够处理这个问题。 You should search in pandas
Github issues ( like this one ) and maybe consider opening a new one if you can't find a fix.您应该在
pandas
Github 问题中搜索(例如这个),如果找不到修复程序,可以考虑打开一个新问题。
Another solution would be to drop psycopg2.tz.FixedOffsetTimezone
, but you may have some reason to use this specifically.另一种解决方案是删除
psycopg2.tz.FixedOffsetTimezone
,但您可能有一些理由专门使用它。
Using Pandas 1.0.5 with Python 3.8 I was still getting a series of Nans.在 Python 3.8 中使用 Pandas 1.0.5 我仍然得到一系列的 Nans。 Solved the issue by converting both columns to datetime and then adding skipna=True and numeric_only=False to the max() function:
通过将两列转换为日期时间,然后将 skipna=True 和 numeric_only=False 添加到 max() 函数来解决该问题:
df['1'] = pd.to_datetime(df['1'], utc=True)
df['2'] = pd.to_datetime(df['2'], utc=True)
df['3'] = df[['1', '2']].max(axis=1, skipna=True, numeric_only=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.