简体   繁体   English

用相同的值填充列,但组中的第一行除外

[英]Fill column with same value except first row inside groups

I have a df: 我有一个df:

import pandas as pd
df = pd.DataFrame({'user_id': [1,1,2,1,2,1,2,3], 'movie_id': ['35','120','898','546','989','42','546','35'], 
'time':['1.7','2.1','1.3','2.4','1.4','7.0','2.1','1.1']})

that looks like this: 看起来像这样:

user_id  movie_id  time
1          35      1,7
1         120      2.1
2         898      1.3
1         546      2.4
2         989      1.4
1         42       7.0
2         546      2.1
3         35.      1.1

my goal is to group by user_id, sort by time and fill new column with '1' except 1st row inside each group - column 'time' displays number of seconds that has been elapsed from the last click. 我的目标是按user_id分组,按时间排序,并用“ 1”填充新列,但每个组中的第一行除外-“时间”列显示自最后一次点击起经过的秒数。 eventually I should get such output with indicators for the last movie the user has rated before the active one: 最终,我应该获得这样的输出,其中包含用户在活动电影之前评分的最后一部电影的指示器:

user_id  movie_id  time  last_rated
1          35      1.7      0
1         120      2.1      1
2         898      1.3      0
1         546      2.4      1
2         989      1.4      1
1         42       7.0      1
2         546      2.1      1
3         35       1.1      0

I've experimented with group_by, shift, cumsum but still can't get the desired output.. any help would be very appreciated! 我已经尝试了group_by,shift,cumsum,但仍然无法获得所需的输出..任何帮助将不胜感激!

Can use cumcount() and np.where() 可以使用cumcount()np.where()

df['last_rated'] = np.where(df.groupby('user_id').cumcount() == 0, 0, 1)

or (as per @coldspeed below) 或(按照下面的@coldspeed)

df.groupby('user_id').cumcount().astype(bool).astype(int)

Outputs 输出

    user_id   movie_id  time    last_rated
0   1         35          1.7   0
1   1         120         2.1   1
2   2         898         1.3   0
3   1         546         2.4   1
4   2         989         1.4   1
5   1         42          7.0   1
6   2         546         2.1   1
7   3         35          1.1   0

You can use sort_values upfront to assure you have your sorted condition right. 您可以预先使用sort_values来确保您拥有正确的排序条件。 But if you want to keep your df as is, you can sort inside the groups: 但是,如果您想保持df不变,则可以在组内进行排序:

g = df.groupby('user_id', as_index=False).apply(lambda x: x.sort_values(by='time')).groupby('user_id').cumcount().reset_index(level=0,drop=True)

df['l'] = (g/g).fillna(0)

You can use GroupBy + transform with min to calculate a series of minimum values by user_id . 您可以使用带有min GroupBy + transform来通过user_id计算一系列最小值。 Then check for equality against df['time'] and convert from bool to int . 然后根据df['time']检查是否相等,然后从bool转换为int

g = df.groupby('user_id')['time'].transform('min')
df['last_rated'] = (df['time'] != g).astype(int)

Assuming your dataframe is already sorted by time for each user_id , you can more efficiently use GroupBy with 'first' : 假设您的数据帧已经按time排序给每个user_id ,则可以更有效地将GroupBy'first'

g = df.groupby('user_id')['time'].transform('first')

Result: 结果:

print(df)

   user_id movie_id time  last_rated
0        1       35  1.7           0
1        1      120  2.1           1
2        2      898  1.3           0
3        1      546  2.4           1
4        2      989  1.4           1
5        1       42  7.0           1
6        2      546  2.1           1
7        3       35  1.1           0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据同一行中另一列的值填充缺失值 - Fill missing value based on value from another column in the same row 使用来自同一行但不同列的值填充字典 - Fill dictionary with value from the same row, but different column Pandas:同列不同行如何填写值 - Pandas:How to fill in value from the same column but different row 根据Pandas中第二列的条件,用另一行的同一列的值填充特定行的列中的值 - Fill values in a column of a particular row with the value of same column from another row based on a condition on second column in Pandas 如何使用pandas数据帧中第一行和相应行之间的列的平均值填充特定值 - How to fill a particular value with mean value of the column between first row and the corresponding row in pandas dataframe Pandas 用行值填充列 - Pandas fill column with row value 如果第一列相同,然后将行值附加到熊猫中 - if the first column is the same and then append the row value together in pandas Python:从数据帧的列中删除除我们存储在第一行中的最后一个值之外的所有数据 - Python : Remove all data from a column of a dataframe except the last value that we store in the first row 使用pandas在csv文件的同一行上填充下一列值的行中的空值 - Fill empty values from a row with the value of next column on the same row on csv file with pandas 一种基于熊猫中的组用行值填充列的优雅方法 - Elegant way to fill in a column with row values based on groups in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM