2 pandas Dataframes with different columns

Question

I have 2 Dataframes, fit and mass. They only have one similar column, 'CATAID'. The fit Dataframe contains information about the whole experiment. The mass one, however, only contains a small population of the experiment.

For my work, I need the information in the fit DataFrame, but for the 'CATAID' values in the mass Dataframe. I need to loop over the column values in fit and pick rows that match with CATAID values in mass.

I'm using the following loop,

file=pd.DataFrame()
for i in mass.index:
    cataid_m=mass.loc[i,'CATAID']
    for j in fit.index:
        cataid_f=fit.loc[j,'CATAID']
        if cataid_m==cataid_f:
            file[j]=fit.iloc[j]

My only concern is the amount of time this loop takes. I was wondering if anyone has any suggestions on how to improve this loop?

Answer 1

You can do this by first getting the ids in the mass dataframe

mass_id = mass_df['CATAID'].unique().tolist()

Then you can get the rows from your main dataframe, where the CATAID is inside mass_id:

relevant_df = fit_df.loc[fit_df['CATAID'].isin(mass_id)]

I don't think a merge works here as Prune comments, because we aren't trying to join these two dataframes. We are just trying to extract the ids from one dataframe, and get the rows which match those ids.

2 pandas Dataframes with different columns

Question

1 answers

solution1
0 ACCPTED 2019-12-11 23:20:08

2 pandas Dataframes with different columns

Question

1 answers

solution1 0 ACCPTED 2019-12-11 23:20:08

solution1
0 ACCPTED 2019-12-11 23:20:08