简体   繁体   English

使用 pandas 在依赖于另一列的一列上有效地应用操作

[英]Applying an operation efficiently on one column that depends on another column with pandas

I have a Dataframe called df with around 20m rows, that looks like我有一个名为df的 Dataframe 大约有 20m 行,看起来像

userId  movieId rating
0   1   296     5.0
1   1   306     3.5
2   1   307     5.0
3   2   665     5.0
4   2   899     3.5
...

and I have a Series, user_bias我有一个系列, user_bias

userId
1         0.280431
2         0.096580
3         0.163554
4        -0.155755
5         0.218621
...

I would like to subtract the matching value according to userId column in user_bias from df['rating'] .我想根据user_bias中的userId列从df['rating']中减去匹配值。 For example the rating value of the first row should be replaced with 5.0 - 0.280431 = 4.719569 .例如,第一行的评级值应替换为5.0 - 0.280431 = 4.719569 I tried two solutions but they seems to be very slow.我尝试了两种解决方案,但它们似乎很慢。 Is there a better way to achieve this?有没有更好的方法来实现这一目标?

Solution 1解决方案 1

for i, row in df.iterrows():
    df.at[i, 'rating'] -= user_bias[row.userId]

Solution 2解决方案 2

To get rid of the for loop, I've used apply method.为了摆脱 for 循环,我使用了apply方法。 Not sure if it is correct result-wise but it is again way slower than I expected.不确定结果是否正确,但它再次比我预期的要慢。

df['rating'] = df.apply(lambda row: row.rating - user_bias[row.userId], axis=1)

Try with reindex尝试reindex

df['rating'] = df['rating'] - user_bias.reindex(df['userId']).values

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 GroupBy一列,对pandas中另一列分组记录进行自定义操作 - GroupBy one column, custom operation on another column of grouped records in pandas 基于另一列的值对一列Pandas DF进行条件运算 - Conditional operation on one column of Pandas DF based on value of another column 在pandas中使用固定列对多个列应用操作 - Applying an operation on multiple columns with a fixed column in pandas pandas - 将一列复制到另一列而不会有效覆盖 - pandas - copying one column to another without overwriting efficiently Pandas 有效地应用依赖于索引值的函数 - Pandas efficiently applying a function that depends on index value 如果其他列符合条件,则将操作应用于“熊猫”列 - Applying Operation to Pandas column if other column meets criteria 在pandas中创建一个得分列,其值取决于另一列的百分位数 - Create a score column in pandas whose value depends on the percentile of another column 熊猫为另一列分配不同的值,取决于另一列中的值 - pandas assign different values to a column depends on the values in another column Pandas DataFrame。 聚合列取决于另一列中的值 - Pandas DataFrame. Aggregate column in depends of values in another column 列值取决于另一列,条件为 pandas - Column values which depends on another column with conditions in pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM