[英]Finding the mean and standard deviation of a timedelta object in pandas df
I would like to calculate the mean
and standard deviation
of a timedelta
by bank from a dataframe
with two columns shown below. 我想从下面显示的两列
dataframe
timedelta
计算出时间点的时间mean
和standard deviation
。 When I run the code (also shown below) I get the below error: 当我运行代码(也显示如下)时,我得到以下错误:
pandas.core.base.DataError: No numeric types to aggregate
My dataframe: 我的数据帧:
bank diff
Bank of Japan 0 days 00:00:57.416000
Reserve Bank of Australia 0 days 00:00:21.452000
Reserve Bank of New Zealand 55 days 12:39:32.269000
U.S. Federal Reserve 8 days 13:27:11.387000
My code: 我的代码:
means = dropped.groupby('bank').mean()
std = dropped.groupby('bank').std()
You need to convert timedelta
to some numeric value, eg int64
by values
what is most accurate, because convert to ns
is what is the numeric representation of timedelta
: 您需要将
timedelta
转换为某个数值,例如int64
的values
是最准确的values
,因为转换为ns
是timedelta
的数字表示timedelta
:
dropped['new'] = dropped['diff'].values.astype(np.int64)
means = dropped.groupby('bank').mean()
means['new'] = pd.to_timedelta(means['new'])
std = dropped.groupby('bank').std()
std['new'] = pd.to_timedelta(std['new'])
Another solution is to convert values to seconds
by total_seconds
, but that is less accurate: 另一个解决方案是使用
total_seconds
将值转换为seconds
,但这不太准确:
dropped['new'] = dropped['diff'].dt.total_seconds()
means = dropped.groupby('bank').mean()
No need to convert timedelta
back and forth. 无需来回转换
timedelta
。 Numpy and pandas can seamlessly do it for you with a faster run time. Numpy和Pandas可以通过更快的运行时间无缝地为您完成。 Using your
dropped
DataFrame
: 使用已
dropped
DataFrame
:
import numpy as np
grouped = dropped.groupby('bank')['diff']
mean = grouped.apply(lambda x: np.mean(x))
std = grouped.apply(lambda x: np.std(x))
Pandas mean()
and other aggregation methods support numeric_only=False
parameter. Pandas
mean()
和其他聚合方法支持numeric_only=False
参数。
dropped.groupby('bank').mean(numeric_only=False)
Found here: Aggregations for Timedelta values in the Python DataFrame 在此处找到: Python DataFrame中Timedelta值的聚合
I would suggest passing the numeric_only=False
argument to mean
as mentioned by Alexander Usikov - this works for pandas version 0.20+. 我建议传递
numeric_only=False
参数mean
由亚历山大Usikov提到-这个工程的熊猫版0.20+。
If you have an older version, the following works: 如果您使用的是旧版本,则以下内容有效:
import pandas pd
df = pd.DataFrame({
'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]),
'group': ['a', 'a', 'a', 'b', 'b']
})
(
df
.astype({'td': int}) # convert timedelta to integer (nanoseconds)
.groupby('group')
.mean()
.astype({'td': 'timedelta64[ns]'})
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.