Removing few rows from the dataframe based on a condition

Question

There are some set of products which cannot be assigned together (eg product “5649565” and “5649646” cannot be given together to any customer). You can get this list in the Exclusion table:

product1	product2
5649646	5649565
5649585	5649910
5649585	5649921
5649607	5649931
5649607	5649929

df_customers:

customers	product	relevancy_score
A10001	5649646	0.646916
A10001	5649565	0.608653
A10001	5649585	0.587336
A10001	5649910	0.581182
A10001	5650462	0.575269
A10787	5650544	0.008170
A10787	5649815	0.003877
A10787	5649925	0.002392

ie customer A10001 should get only one of the products, not both of products at same index for example customer A10001 should either get 5649646 or 5649565 but can't get both of them. But we can see in the df_customers table that he have got both the product so I have to remove the row having product 5649565 with the A10001 customer. How can I solve this with python.

Answer 1

It is not clear which product should be given a preference. The one with the highest relevancy_score or product 1 over product 2?

Here is a solution to prefer higher relevancy_score , for example.


import pandas as pd

df = pd.DataFrame({
    'customers': [1, 1, 1, 2, 2, 3, 3, 3],
    'product': ['a', 'b', 'c', 'c', 'a', 'b', 'a', 'c'],
    'relevancy_score': [0.5, 0.9, 0.1, 0.95, 0.3, 0.5, 0.8, 0.4],
})

new_df = pd.concat(
    [
        group[group['relevancy_score'] == group['relevancy_score'].max()]
        for _, group in df.groupby('customers')
    ]
)
print(new_df)

Output

   customers product  relevancy_score
1          1       b             0.90
3          2       c             0.95
6          3       a             0.80

Removing few rows from the dataframe based on a condition

Question

1 answers

solution1
0 2021-02-08 06:41:27

Removing few rows from the dataframe based on a condition

Question

1 answers

solution1 0 2021-02-08 06:41:27

solution1
0 2021-02-08 06:41:27