[英]pandas: groupby and calculate time difference from first element in each group
In pandas, I would like to group data by the values in a column and then calculate the time difference between each timestamp and the first timestamp in that group. 在pandas中,我想按列中的值对数据进行分组,然后计算每个时间戳与该组中第一个时间戳之间的时间差。
For example, consider the following DataFrame: 例如,请考虑以下DataFrame:
# Create data.
d = {'foo': ['001', '001', '002', '002', '002'],
'timestamp': ['2015-02-24 19:12:00', '2015-02-24 21:38:00', '2015-02-25 03:41:00', '2015-02-25 03:44:00', '2015-02-25 03:49:00']}
df = pd.DataFrame(d, columns = ['foo', 'timestamp'])
df['timestamp'] = pd.DatetimeIndex(pd.to_datetime(df['timestamp'])).tz_localize('UTC')
>>> print df
foo timestamp
0 001 2015-02-24 19:12:00+00:00
1 001 2015-02-24 21:38:00+00:00
2 002 2015-02-25 03:41:00+00:00
3 002 2015-02-25 03:44:00+00:00
4 002 2015-02-25 03:49:00+00:00
The desired output would be: 期望的输出是:
foo timestamp output
0 001 2015-02-24 19:12:00+00:00 NaT
1 001 2015-02-24 21:38:00+00:00 02:26:00
2 002 2015-02-25 03:41:00+00:00 NaT
3 002 2015-02-25 03:44:00+00:00 00:03:00
4 002 2015-02-25 03:49:00+00:00 00:08:00
The use of .diff()
gets the following, but not the desired result. 使用.diff()
获得以下内容,但不是所需的结果。
>>> d.groupby('foo')['timestamp'].diff()
0 NaT
1 02:26:00
2 NaT
3 00:03:00
4 00:05:00
Use assign
+ apply
使用assign
+ apply
df.assign(output=df.groupby('foo').timestamp.apply(lambda x: x - x.iloc[0]))
foo timestamp output
0 001 2015-02-24 19:12:00+00:00 00:00:00
1 001 2015-02-24 21:38:00+00:00 02:26:00
2 002 2015-02-25 03:41:00+00:00 00:00:00
3 002 2015-02-25 03:44:00+00:00 00:03:00
4 002 2015-02-25 03:49:00+00:00 00:08:00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.