I have a dataframe like:
Client_ID Product_nb Item_id
1 1 i1
1 1 i2
1 1 i3
1 2 i2
1 2 i5
1 2 i7
1 3 i1
1 3 i2
1 3 i4
1 3 i6
2 1 i1
2 1 i2
2 1 i3
2 1 i4
2 2 i1
2 2 i2
... ... ...
So each client ( client_id
) has several products ( Product_nb
). For each product, i want to keep only one item ( item_id
). And for same client, the next product should not correspond to the previous product.
I want to add a flag next to each item if i need to keep the item or not :
Client_ID Product_nb Item_id Keep
1 1 i1 1
1 1 i2 0
1 1 i3 0
1 2 i2 1
1 2 i5 0
1 2 i7 0
1 3 i1 0
1 3 i2 0
1 3 i4 1
1 3 i6 0
2 1 i1 1
2 1 i2 0
2 1 i3 0
2 1 i4 0
2 2 i1 0
2 2 i2 1
... ... ... ...
My idea for this was to iterate over all clients and products. For each client, save the items that have been kept in a list :
df = df.set_index(['client_id','product_nb','item_id','keep'])
client_ids = df.index.get_level_values('client_id').unique()
for client in client_ids:
list_already = []
prod_nbs = df.loc[client].index.get_level_values('product_nb').unique()
for prod_nb in prod_nbs:
item_ids = df.loc[client,prod_nb].index.get_level_values('item_id').unique()
for item_id in item_ids:
if (item_id in list_already):
df.loc[client,prod_nb,item_id,'keep'] = 1
continue
else:
list_already.append(item_id)
df.loc[client,prod_nb,item_id,'keep'] = 1
break
But this returns me the input dataframe.
I'll be greatful to any sort of help. Thank you
In pandas you usually don't wanto to loop over your DataFrame. It is slow and there are much more optimized routines for almost anything. In your case
df.groupby(['Client_ID', 'Product_nb'])['Item_id'].first()
does the job. Replace df
by the name of your DataFrame
Edit: I overread the contraint, that your chosen value should be unique. It would probably be best to filter the values beforehand and groupby
afterwards
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.