I have one dataframe that looks like this, with additonal columns:
ID Paired_ID ...
123_1 123_2
123_2 123_1
456_1 456_2
456_2 456_1
789_1 789_2
789_2 789_1
789_3 789_4
789_4 789_3
What I would like to do is, for a particular ID, take the row where it's Paired_ID is the ID, and combine the two rows into one. I've been trying to use pandas merge (
pd.merge(df, df, left_on="ID", right_on="Paired_ID"
but I'm getting duplicates and can't figure out how to get rid of them.
I would like:
ID_x Paired_ID_x ID_y Paired_ID_y ...
123_1 123_2 123_2 123_1
456_1 456_2 456_2 456_1
789_1 789_2 789_2 789_1
789_3 789_4 789_4 789_3
The assumption is that every value in ID is in paired_ID.
Compare the ends after the '_' delimiter and create two new dataframes,
Concat the dataframes on the columns axis to get your output.
#this extracts the ends of each value in ID and Paired_ID
A = df.ID.str.split('_').str[-1].astype(int)
B = df.Paired_ID.str.split('_').str[-1].astype(int)
#compare, filter df based on the comparison outcome and add suffixes
filter_1 = df.loc[A.le(B)].reset_index(drop=True).add_suffix('_x')
filter_2 = df.loc[~A.le(B)].reset_index(drop=True).add_suffix('_y')
#concatenate along the columns axis to get outcome
pd.concat([filter_1,filter_2],axis=1)
ID_x Paired_ID_x ID_y Paired_ID_y
0 123_1 123_2 123_2 123_1
1 456_1 456_2 456_2 456_1
2 789_1 789_2 789_2 789_1
3 789_3 789_4 789_4 789_3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.