I have two dataframes
df = pd.DataFrame({'A': ['charlotte', 'bob' , 'dave' , 'dave' , 'alice'],
'B': ['charlie' , 'bridget', 'andy', 'diana', 'andy'],
'outcome': ['yes','no','yes','no','yes']})
and
pairs= pd.DataFrame({'A': ['alice', 'bridget' , 'charlie' , 'diana' , 'dave'],
'B': ['andy' , 'bob', 'charlotte', 'dave', 'andy'],
'outcome_2': ['no','yes','no','no','yes']})
Notice that the rows and columns are misaligned...
I would like to get a data frame that aligns outcome_1
and outcome_2
My solution has been to convert the columns into one as a set
and then compare:
df['combined']=[set(i) for i in df[['A','B']].values.tolist()]
A B outcome combined
0 charlotte charlie yes {charlie, charlotte}
1 bob bridget no {bridget, bob}
2 dave andy yes {dave, andy}
3 dave diana no {diana, dave}
4 alice andy yes {andy, alice}
pairs['combined']=[set(i) for i in pairs[['A','B']].values.tolist()]
A B outcome combined
0 alice andy no {andy, alice}
1 bridget bob yes {bridget, bob}
2 charlie charlotte no {charlie, charlotte}
3 diana dave no {diana, dave}
4 dave andy yes {dave, andy}
idxs=[np.where(pairs['combined']==i)[0][0] for i in df['combined']]
final=pd.DataFrame({'outcome_1':[df['outcome'][i] for i in idxs],'outcome_2':[pairs['outcome'][i] for i in idxs]})
final:
outcome_1 outcome_2
0 yes no
1 no yes
2 yes yes
3 no no
4 yes no
How can this be done efficiently? ideally, the code would first sort the columns ['A','B'] so that they are both aligned.
Let's try np.sort
to sort the columns horizontally and merge:
def align(df, columns=['A','B']):
ret = df.copy()
ret[columns] = np.sort(df[columns])
return ret
pd.merge(align(df), align(pairs), on=['A','B'])[['outcome','outcome_2']]
Output:
outcome outcome_2
0 yes no
1 no yes
2 yes yes
3 no no
4 yes no
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.