简体   繁体   中英

Python pandas loop efficient through two dataframes with different lengths

I have two dataframes with different lengths(df,df1). They share one similar label "collo_number" . I want to search the second dataframe for every collo_number in the first data frame. Problem is that the second date frame contains multiple rows for different dates for every collo_nummer . So i want to sum these dates and add this in a new column in the first database.

I now use a loop but it is rather slow and has to perform this operation for al 7 days in a week. Is there a way to get a better performance? I tried multiple solutions but keep getting the error that i cannot use the equal sign for two databases with different lenghts. Help would really be appreciated! Here is an example of what is working but with a rather bad performance.

df5=[df1.loc[(df1.index == nasa) & (df1.afleverdag == x1) & (df1.ind_init_actie=="N"), "aantal_colli"].sum() for nasa in df.collonr]

Your description is a bit vague (hence my comment). First what you good do is to select the rows of the dataframe that you want to search:

dftmp = df1[(df1.afleverdag==x1) & (df1.ind_init_actie=='N')]

so that you don't do this for every item in the loop. Second, use .groupby .

newseries = dftmp['aantal_colli'].groupby(dftmp.index).sum()  
newseries = newseries.ix[df.collonr.unique()]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM