用相同的值填充列，但组中的第一行除外

Question

I have a df: 我有一个df：

import pandas as pd
df = pd.DataFrame({'user_id': [1,1,2,1,2,1,2,3], 'movie_id': ['35','120','898','546','989','42','546','35'], 
'time':['1.7','2.1','1.3','2.4','1.4','7.0','2.1','1.1']})

that looks like this: 看起来像这样：

user_id  movie_id  time
1          35      1,7
1         120      2.1
2         898      1.3
1         546      2.4
2         989      1.4
1         42       7.0
2         546      2.1
3         35.      1.1

my goal is to group by user_id, sort by time and fill new column with '1' except 1st row inside each group - column 'time' displays number of seconds that has been elapsed from the last click. 我的目标是按user_id分组，按时间排序，并用“ 1”填充新列，但每个组中的第一行除外-“时间”列显示自最后一次点击起经过的秒数。 eventually I should get such output with indicators for the last movie the user has rated before the active one: 最终，我应该获得这样的输出，其中包含用户在活动电影之前评分的最后一部电影的指示器：

user_id  movie_id  time  last_rated
1          35      1.7      0
1         120      2.1      1
2         898      1.3      0
1         546      2.4      1
2         989      1.4      1
1         42       7.0      1
2         546      2.1      1
3         35       1.1      0

I've experimented with group_by, shift, cumsum but still can't get the desired output.. any help would be very appreciated! 我已经尝试了group_by，shift，cumsum，但仍然无法获得所需的输出..任何帮助将不胜感激！

Answer 1

Can use cumcount() and np.where() 可以使用cumcount()和np.where()

df['last_rated'] = np.where(df.groupby('user_id').cumcount() == 0, 0, 1)

or (as per @coldspeed below) 或（按照下面的@coldspeed）

df.groupby('user_id').cumcount().astype(bool).astype(int)

Outputs 输出

    user_id   movie_id  time    last_rated
0   1         35          1.7   0
1   1         120         2.1   1
2   2         898         1.3   0
3   1         546         2.4   1
4   2         989         1.4   1
5   1         42          7.0   1
6   2         546         2.1   1
7   3         35          1.1   0

You can use sort_values upfront to assure you have your sorted condition right. 您可以预先使用sort_values来确保您拥有正确的排序条件。 But if you want to keep your df as is, you can sort inside the groups: 但是，如果您想保持df不变，则可以在组内进行排序：

g = df.groupby('user_id', as_index=False).apply(lambda x: x.sort_values(by='time')).groupby('user_id').cumcount().reset_index(level=0,drop=True)

df['l'] = (g/g).fillna(0)

Answer 2

You can use GroupBy + transform with min to calculate a series of minimum values by user_id . 您可以使用带有min GroupBy + transform来通过user_id计算一系列最小值。 Then check for equality against df['time'] and convert from bool to int . 然后根据df['time']检查是否相等，然后从bool转换为int 。

g = df.groupby('user_id')['time'].transform('min')
df['last_rated'] = (df['time'] != g).astype(int)

Assuming your dataframe is already sorted by time for each user_id , you can more efficiently use GroupBy with 'first' : 假设您的数据帧已经按time排序给每个user_id ，则可以更有效地将GroupBy与'first' ：

g = df.groupby('user_id')['time'].transform('first')

Result: 结果：

print(df)

   user_id movie_id time  last_rated
0        1       35  1.7           0
1        1      120  2.1           1
2        2      898  1.3           0
3        1      546  2.4           1
4        2      989  1.4           1
5        1       42  7.0           1
6        2      546  2.1           1
7        3       35  1.1           0

用相同的值填充列，但组中的第一行除外

问题描述

2 个解决方案

解决方案1
2 2018-08-04 19:25:42

解决方案2
1 已采纳 2018-08-04 19:25:35

用相同的值填充列，但组中的第一行除外

问题描述

2 个解决方案

解决方案1 2 2018-08-04 19:25:42

解决方案2 1 已采纳 2018-08-04 19:25:35

解决方案1
2 2018-08-04 19:25:42

解决方案2
1 已采纳 2018-08-04 19:25:35