简体   繁体   中英

2 pandas Dataframes with different columns

I have 2 Dataframes, fit and mass. They only have one similar column, 'CATAID'. The fit Dataframe contains information about the whole experiment. The mass one, however, only contains a small population of the experiment.

For my work, I need the information in the fit DataFrame, but for the 'CATAID' values in the mass Dataframe. I need to loop over the column values in fit and pick rows that match with CATAID values in mass.

I'm using the following loop,

file=pd.DataFrame()
for i in mass.index:
    cataid_m=mass.loc[i,'CATAID']
    for j in fit.index:
        cataid_f=fit.loc[j,'CATAID']
        if cataid_m==cataid_f:
            file[j]=fit.iloc[j]

My only concern is the amount of time this loop takes. I was wondering if anyone has any suggestions on how to improve this loop?

You can do this by first getting the ids in the mass dataframe

mass_id = mass_df['CATAID'].unique().tolist()

Then you can get the rows from your main dataframe, where the CATAID is inside mass_id:

relevant_df = fit_df.loc[fit_df['CATAID'].isin(mass_id)]

I don't think a merge works here as Prune comments, because we aren't trying to join these two dataframes. We are just trying to extract the ids from one dataframe, and get the rows which match those ids.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM