Pandas - Add a flag column in dataframe

Question

I have a dataframe like:

Client_ID    Product_nb   Item_id
1            1            i1  
1            1            i2
1            1            i3

1            2            i2
1            2            i5  
1            2            i7

1            3            i1
1            3            i2
1            3            i4
1            3            i6

2            1            i1
2            1            i2
2            1            i3
2            1            i4

2            2            i1
2            2            i2
...          ...          ...

So each client ( client_id ) has several products ( Product_nb ). For each product, i want to keep only one item ( item_id ). And for same client, the next product should not correspond to the previous product.

I want to add a flag next to each item if i need to keep the item or not :

Client_ID    Product_nb   Item_id   Keep
1            1            i1        1
1            1            i2        0
1            1            i3        0

1            2            i2        1
1            2            i5        0
1            2            i7        0

1            3            i1        0
1            3            i2        0
1            3            i4        1
1            3            i6        0

2            1            i1        1
2            1            i2        0
2            1            i3        0
2            1            i4        0

2            2            i1        0
2            2            i2        1
...          ...          ...       ...

My idea for this was to iterate over all clients and products. For each client, save the items that have been kept in a list :

df = df.set_index(['client_id','product_nb','item_id','keep'])
client_ids = df.index.get_level_values('client_id').unique()
for client in client_ids:
    list_already = []
    prod_nbs = df.loc[client].index.get_level_values('product_nb').unique()
    for prod_nb in prod_nbs:
        item_ids = df.loc[client,prod_nb].index.get_level_values('item_id').unique()
        for item_id in item_ids:
            if (item_id in list_already):
                df.loc[client,prod_nb,item_id,'keep'] = 1
                continue
            else:
                list_already.append(item_id)
                df.loc[client,prod_nb,item_id,'keep'] = 1
                break

But this returns me the input dataframe.

I'll be greatful to any sort of help. Thank you

Answer 1

In pandas you usually don't wanto to loop over your DataFrame. It is slow and there are much more optimized routines for almost anything. In your case

df.groupby(['Client_ID', 'Product_nb'])['Item_id'].first()

does the job. Replace df by the name of your DataFrame

Edit: I overread the contraint, that your chosen value should be unique. It would probably be best to filter the values beforehand and groupby afterwards

Pandas - Add a flag column in dataframe

Question

1 answers

solution1
-1 2017-08-11 09:30:20

Pandas - Add a flag column in dataframe

Question

1 answers

solution1 -1 2017-08-11 09:30:20

solution1
-1 2017-08-11 09:30:20