简体   繁体   中英

Removing few rows from the dataframe based on a condition

There are some set of products which cannot be assigned together (eg product “5649565” and “5649646” cannot be given together to any customer). You can get this list in the Exclusion table:

product1 product2
5649646 5649565
5649585 5649910
5649585 5649921
5649607 5649931
5649607 5649929

df_customers:

customers product relevancy_score
A10001 5649646 0.646916
A10001 5649565 0.608653
A10001 5649585 0.587336
A10001 5649910 0.581182
A10001 5650462 0.575269
A10787 5650544 0.008170
A10787 5649815 0.003877
A10787 5649925 0.002392

ie customer A10001 should get only one of the products, not both of products at same index for example customer A10001 should either get 5649646 or 5649565 but can't get both of them. But we can see in the df_customers table that he have got both the product so I have to remove the row having product 5649565 with the A10001 customer. How can I solve this with python.

It is not clear which product should be given a preference. The one with the highest relevancy_score or product 1 over product 2?

Here is a solution to prefer higher relevancy_score , for example.


import pandas as pd

df = pd.DataFrame({
    'customers': [1, 1, 1, 2, 2, 3, 3, 3],
    'product': ['a', 'b', 'c', 'c', 'a', 'b', 'a', 'c'],
    'relevancy_score': [0.5, 0.9, 0.1, 0.95, 0.3, 0.5, 0.8, 0.4],
})

new_df = pd.concat(
    [
        group[group['relevancy_score'] == group['relevancy_score'].max()]
        for _, group in df.groupby('customers')
    ]
)
print(new_df)

Output

   customers product  relevancy_score
1          1       b             0.90
3          2       c             0.95
6          3       a             0.80

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM