简体   繁体   中英

matching multiple columns ignoring order in pandas

I have two dataframes

df = pd.DataFrame({'A': ['charlotte', 'bob'    , 'dave'  , 'dave' , 'alice'], 
                   'B': ['charlie' , 'bridget', 'andy', 'diana', 'andy'],
                   'outcome': ['yes','no','yes','no','yes']})

and

pairs= pd.DataFrame({'A': ['alice', 'bridget'    , 'charlie'  , 'diana' , 'dave'], 
                   'B': ['andy' , 'bob', 'charlotte', 'dave', 'andy'],
                    'outcome_2': ['no','yes','no','no','yes']})

Notice that the rows and columns are misaligned...

I would like to get a data frame that aligns outcome_1 and outcome_2

My solution has been to convert the columns into one as a set and then compare:

df['combined']=[set(i) for i in df[['A','B']].values.tolist()]


       A    B     outcome   combined
0 charlotte charlie yes {charlie, charlotte}
1   bob     bridget no  {bridget, bob}
2   dave    andy    yes {dave, andy}
3   dave    diana   no  {diana, dave}
4   alice   andy    yes {andy, alice}

pairs['combined']=[set(i) for i in pairs[['A','B']].values.tolist()]

    A          B    outcome combined
0   alice   andy    no    {andy, alice}
1   bridget  bob    yes   {bridget, bob}
2 charlie charlotte no    {charlie, charlotte}
3   diana   dave    no    {diana, dave}
4   dave    andy    yes   {dave, andy}


idxs=[np.where(pairs['combined']==i)[0][0] for i in df['combined']]

final=pd.DataFrame({'outcome_1':[df['outcome'][i] for i in idxs],'outcome_2':[pairs['outcome'][i] for i in idxs]})

final:

outcome_1   outcome_2
0   yes      no
1   no       yes
2   yes      yes
3   no       no
4   yes      no

How can this be done efficiently? ideally, the code would first sort the columns ['A','B'] so that they are both aligned.

Let's try np.sort to sort the columns horizontally and merge:

def align(df, columns=['A','B']):
    ret = df.copy()
    ret[columns] = np.sort(df[columns])
    return ret

pd.merge(align(df), align(pairs), on=['A','B'])[['outcome','outcome_2']]

Output:

  outcome outcome_2
0     yes        no
1      no       yes
2     yes       yes
3      no        no
4     yes        no

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM