简体   繁体   中英

Applying an operation efficiently on one column that depends on another column with pandas

I have a Dataframe called df with around 20m rows, that looks like

userId  movieId rating
0   1   296     5.0
1   1   306     3.5
2   1   307     5.0
3   2   665     5.0
4   2   899     3.5
...

and I have a Series, user_bias

userId
1         0.280431
2         0.096580
3         0.163554
4        -0.155755
5         0.218621
...

I would like to subtract the matching value according to userId column in user_bias from df['rating'] . For example the rating value of the first row should be replaced with 5.0 - 0.280431 = 4.719569 . I tried two solutions but they seems to be very slow. Is there a better way to achieve this?

Solution 1

for i, row in df.iterrows():
    df.at[i, 'rating'] -= user_bias[row.userId]

Solution 2

To get rid of the for loop, I've used apply method. Not sure if it is correct result-wise but it is again way slower than I expected.

df['rating'] = df.apply(lambda row: row.rating - user_bias[row.userId], axis=1)

Try with reindex

df['rating'] = df['rating'] - user_bias.reindex(df['userId']).values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM