There are some set of products which cannot be assigned together (eg product “5649565” and “5649646” cannot be given together to any customer). You can get this list in the Exclusion table:
product1 | product2 |
---|---|
5649646 | 5649565 |
5649585 | 5649910 |
5649585 | 5649921 |
5649607 | 5649931 |
5649607 | 5649929 |
df_customers:
customers | product | relevancy_score |
---|---|---|
A10001 | 5649646 | 0.646916 |
A10001 | 5649565 | 0.608653 |
A10001 | 5649585 | 0.587336 |
A10001 | 5649910 | 0.581182 |
A10001 | 5650462 | 0.575269 |
A10787 | 5650544 | 0.008170 |
A10787 | 5649815 | 0.003877 |
A10787 | 5649925 | 0.002392 |
ie customer A10001 should get only one of the products, not both of products at same index for example customer A10001 should either get 5649646 or 5649565 but can't get both of them. But we can see in the df_customers table that he have got both the product so I have to remove the row having product 5649565 with the A10001 customer. How can I solve this with python.
It is not clear which product should be given a preference. The one with the highest relevancy_score
or product 1 over product 2?
Here is a solution to prefer higher relevancy_score
, for example.
import pandas as pd
df = pd.DataFrame({
'customers': [1, 1, 1, 2, 2, 3, 3, 3],
'product': ['a', 'b', 'c', 'c', 'a', 'b', 'a', 'c'],
'relevancy_score': [0.5, 0.9, 0.1, 0.95, 0.3, 0.5, 0.8, 0.4],
})
new_df = pd.concat(
[
group[group['relevancy_score'] == group['relevancy_score'].max()]
for _, group in df.groupby('customers')
]
)
print(new_df)
Output
customers product relevancy_score
1 1 b 0.90
3 2 c 0.95
6 3 a 0.80
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.