[英]Applying an operation efficiently on one column that depends on another column with pandas
I have a Dataframe called df
with around 20m rows, that looks like我有一个名为
df
的 Dataframe 大约有 20m 行,看起来像
userId movieId rating
0 1 296 5.0
1 1 306 3.5
2 1 307 5.0
3 2 665 5.0
4 2 899 3.5
...
and I have a Series, user_bias
我有一个系列,
user_bias
userId
1 0.280431
2 0.096580
3 0.163554
4 -0.155755
5 0.218621
...
I would like to subtract the matching value according to userId
column in user_bias
from df['rating']
.我想根据
user_bias
中的userId
列从df['rating']
中减去匹配值。 For example the rating value of the first row should be replaced with 5.0 - 0.280431 = 4.719569
.例如,第一行的评级值应替换为
5.0 - 0.280431 = 4.719569
。 I tried two solutions but they seems to be very slow.我尝试了两种解决方案,但它们似乎很慢。 Is there a better way to achieve this?
有没有更好的方法来实现这一目标?
for i, row in df.iterrows():
df.at[i, 'rating'] -= user_bias[row.userId]
To get rid of the for loop, I've used apply
method.为了摆脱 for 循环,我使用了
apply
方法。 Not sure if it is correct result-wise but it is again way slower than I expected.不确定结果是否正确,但它再次比我预期的要慢。
df['rating'] = df.apply(lambda row: row.rating - user_bias[row.userId], axis=1)
Try with reindex
尝试
reindex
df['rating'] = df['rating'] - user_bias.reindex(df['userId']).values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.