[英]Fill column with same value except first row inside groups
I have a df: 我有一个df:
import pandas as pd
df = pd.DataFrame({'user_id': [1,1,2,1,2,1,2,3], 'movie_id': ['35','120','898','546','989','42','546','35'],
'time':['1.7','2.1','1.3','2.4','1.4','7.0','2.1','1.1']})
that looks like this: 看起来像这样:
user_id movie_id time
1 35 1,7
1 120 2.1
2 898 1.3
1 546 2.4
2 989 1.4
1 42 7.0
2 546 2.1
3 35. 1.1
my goal is to group by user_id, sort by time and fill new column with '1' except 1st row inside each group - column 'time' displays number of seconds that has been elapsed from the last click. 我的目标是按user_id分组,按时间排序,并用“ 1”填充新列,但每个组中的第一行除外-“时间”列显示自最后一次点击起经过的秒数。 eventually I should get such output with indicators for the last movie the user has rated before the active one: 最终,我应该获得这样的输出,其中包含用户在活动电影之前评分的最后一部电影的指示器:
user_id movie_id time last_rated
1 35 1.7 0
1 120 2.1 1
2 898 1.3 0
1 546 2.4 1
2 989 1.4 1
1 42 7.0 1
2 546 2.1 1
3 35 1.1 0
I've experimented with group_by, shift, cumsum but still can't get the desired output.. any help would be very appreciated! 我已经尝试了group_by,shift,cumsum,但仍然无法获得所需的输出..任何帮助将不胜感激!
Can use cumcount()
and np.where()
可以使用cumcount()
和np.where()
df['last_rated'] = np.where(df.groupby('user_id').cumcount() == 0, 0, 1)
or (as per @coldspeed below) 或(按照下面的@coldspeed)
df.groupby('user_id').cumcount().astype(bool).astype(int)
Outputs 输出
user_id movie_id time last_rated
0 1 35 1.7 0
1 1 120 2.1 1
2 2 898 1.3 0
3 1 546 2.4 1
4 2 989 1.4 1
5 1 42 7.0 1
6 2 546 2.1 1
7 3 35 1.1 0
You can use sort_values
upfront to assure you have your sorted condition right. 您可以预先使用sort_values
来确保您拥有正确的排序条件。 But if you want to keep your df
as is, you can sort inside the groups: 但是,如果您想保持df
不变,则可以在组内进行排序:
g = df.groupby('user_id', as_index=False).apply(lambda x: x.sort_values(by='time')).groupby('user_id').cumcount().reset_index(level=0,drop=True)
df['l'] = (g/g).fillna(0)
You can use GroupBy
+ transform
with min
to calculate a series of minimum values by user_id
. 您可以使用带有min
GroupBy
+ transform
来通过user_id
计算一系列最小值。 Then check for equality against df['time']
and convert from bool
to int
. 然后根据df['time']
检查是否相等,然后从bool
转换为int
。
g = df.groupby('user_id')['time'].transform('min')
df['last_rated'] = (df['time'] != g).astype(int)
Assuming your dataframe is already sorted by time
for each user_id
, you can more efficiently use GroupBy
with 'first'
: 假设您的数据帧已经按time
排序给每个user_id
,则可以更有效地将GroupBy
与'first'
:
g = df.groupby('user_id')['time'].transform('first')
Result: 结果:
print(df)
user_id movie_id time last_rated
0 1 35 1.7 0
1 1 120 2.1 1
2 2 898 1.3 0
3 1 546 2.4 1
4 2 989 1.4 1
5 1 42 7.0 1
6 2 546 2.1 1
7 3 35 1.1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.