pandas：groupby并计算每组中第一个元素的时差

Question

In pandas, I would like to group data by the values in a column and then calculate the time difference between each timestamp and the first timestamp in that group. 在pandas中，我想按列中的值对数据进行分组，然后计算每个时间戳与该组中第一个时间戳之间的时间差。

For example, consider the following DataFrame: 例如，请考虑以下DataFrame：

# Create data. 
d = {'foo': ['001', '001', '002', '002', '002'], 
     'timestamp': ['2015-02-24 19:12:00', '2015-02-24 21:38:00', '2015-02-25 03:41:00', '2015-02-25 03:44:00', '2015-02-25 03:49:00']}
df = pd.DataFrame(d, columns = ['foo', 'timestamp'])
df['timestamp'] = pd.DatetimeIndex(pd.to_datetime(df['timestamp'])).tz_localize('UTC')
>>> print df
   foo                 timestamp
0  001 2015-02-24 19:12:00+00:00
1  001 2015-02-24 21:38:00+00:00
2  002 2015-02-25 03:41:00+00:00
3  002 2015-02-25 03:44:00+00:00
4  002 2015-02-25 03:49:00+00:00

The desired output would be: 期望的输出是：

   foo                 timestamp    output
0  001 2015-02-24 19:12:00+00:00       NaT
1  001 2015-02-24 21:38:00+00:00  02:26:00
2  002 2015-02-25 03:41:00+00:00       NaT
3  002 2015-02-25 03:44:00+00:00  00:03:00
4  002 2015-02-25 03:49:00+00:00  00:08:00

The use of .diff() gets the following, but not the desired result. 使用.diff()获得以下内容，但不是所需的结果。

>>> d.groupby('foo')['timestamp'].diff()
0        NaT
1   02:26:00
2        NaT
3   00:03:00
4   00:05:00

Answer 1

Use assign + apply 使用assign + apply

df.assign(output=df.groupby('foo').timestamp.apply(lambda x: x - x.iloc[0]))

   foo                 timestamp   output
0  001 2015-02-24 19:12:00+00:00 00:00:00
1  001 2015-02-24 21:38:00+00:00 02:26:00
2  002 2015-02-25 03:41:00+00:00 00:00:00
3  002 2015-02-25 03:44:00+00:00 00:03:00
4  002 2015-02-25 03:49:00+00:00 00:08:00

pandas：groupby并计算每组中第一个元素的时差

问题描述

1 个解决方案

解决方案1
7 2017-03-01 04:45:05

pandas：groupby并计算每组中第一个元素的时差

问题描述

1 个解决方案

解决方案1 7 2017-03-01 04:45:05

解决方案1
7 2017-03-01 04:45:05