简体   繁体   中英

How to count elements in a one to many relationship dataframe in python

i have a dataframe which contains a client code, the number of the contract and the products from the contract.

Something like this :

client_code contract_number product
AAAA 1000 Water
AAAA 1000 Soda
AAAA 1000 Food
BACD 1001 Water
BACD 1001 Soda
DAMR 1002 Food

And I want to add a column which contains a count to see how many products are on a contract from 1 to n . Something like this:

client_code contract_number product count
AAAA 1000 Water 1
AAAA 1000 Soda 2
AAAA 1000 Food 3
BACD 1001 Water 1
BACD 1001 Soda 2
DAMR 1002 Food 1

I've tried with a for loop but it's too slow ( like an hour ).

df['count']=0
k=1
for i in range(0,len(df)-1):
    
    if(df.at[i,'contract_number']==df.at[i+1,'contract_number']):
        df.at[i,'count']=k
        k=k+1    
    else : 
        df.at[i,'count']=k
        k=1
        
df.at[i+1,'count']=k    

Any ideea how to speed up my code ?

PS : My data frame contains 500.000 lines .

Thank you !

IIUC, you want a cumulative count by each client_code (or probably contract_number ) - you can do that with the cumcount function:

df.loc[:, 'count'] = df.groupby('client_code').cumcount() + 1 

Output

  client_code  contract_number product  count
0        AAAA             1000   Water      1
1        AAAA             1000    Soda      2
2        AAAA             1000    Food      3
3        BACD             1001   Water      1
4        BACD             1001    Soda      2
5        DAMR             1002    Food      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM