使用 pandas 在依賴於另一列的一列上有效地應用操作

Question

我有一個名為df的 Dataframe 大約有 20m 行，看起來像

userId  movieId rating
0   1   296     5.0
1   1   306     3.5
2   1   307     5.0
3   2   665     5.0
4   2   899     3.5
...

我有一個系列， user_bias

userId
1         0.280431
2         0.096580
3         0.163554
4        -0.155755
5         0.218621
...

我想根據user_bias中的userId列從df['rating']中減去匹配值。 例如，第一行的評級值應替換為5.0 - 0.280431 = 4.719569 。 我嘗試了兩種解決方案，但它們似乎很慢。 有沒有更好的方法來實現這一目標？

解決方案 1

for i, row in df.iterrows():
    df.at[i, 'rating'] -= user_bias[row.userId]

為了擺脫 for 循環，我使用了apply方法。 不確定結果是否正確，但它再次比我預期的要慢。

df['rating'] = df.apply(lambda row: row.rating - user_bias[row.userId], axis=1)

Answer 1

嘗試reindex

df['rating'] = df['rating'] - user_bias.reindex(df['userId']).values