简体   繁体   中英

pandas - merge on a column with tuples

I have a df like this:

>>> df1

        col_1   col_2    labels
0        aaa     abc     (71020,)
1        ddd     ghi     (99213, 99287,)
2        bbb     cde     (77085,)
3        eee     ijk     (99233, 71020, 36415,)

and another df like this:

>>> df2

   71020  77085  36415  99213  99287  99233  labels_mg
0    1      0      1      0      0      1     (99233, 71020, 36415,)
1    1      0      0      0      0      0     (71020,)
2    0      0      0      1      1      0     (99213, 99287)
3    0      1      0      0      0      0     (77085,)

and would like to generate a df by right-joining above 2 dfs, like this below:

        col_1   col_2    labels                     71020  77085  36415  99213  99287  99233
0        aaa     abc     (71020,)                    1      0      0      0      0      0
1        ddd     ghi     (99213, 99287,)             0      0      0      1      1      0
2        bbb     cde     (77085,)                    0      1      0      0      0      0
3        eee     ijk     (99233, 71020, 36415,)      1      0      1      0      0      1 

Here's what I have tried, but this generates an empty dataframe with 0 rows, but has all column names.

pd.merge(left=df1, right=df2, left_on=['labels'], right_on=['labels_mg'])

tuples are parsed as tuples in both dfs. I have done literal_eval on columns on both of those df after reading from files to pandas dfs. both dfs doesn't share common index too.

my df sizes are (528840, 207) and (528840, 5). how do i do this efficiently?

For me working correct by data from question:

import ast

df1['labels'] = df1['labels'].apply(ast.literal_eval)
df2['labels_mg'] = df2['labels_mg'].apply(ast.literal_eval)
    
df = pd.merge(left=df1, right=df2, left_on=['labels'], right_on=['labels_mg'])
print (df)
  col_1 col_2                 labels  71020  77085  36415  99213  99287  \
0   aaa   abc               (71020,)      1      0      0      0      0   
1   ddd   ghi         (99213, 99287)      0      0      0      1      1   
2   bbb   cde               (77085,)      0      1      0      0      0   
3   eee   ijk  (99233, 71020, 36415)      1      0      1      0      0   

   99233              labels_mg  
0      0               (71020,)  
1      0         (99213, 99287)  
2      0               (77085,)  
3      1  (99233, 71020, 36415)  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM