简体   繁体   中英

Efficient way to replace column of lists by matches with another data frame in Pandas

I have a pandas data frame that looks like:

col11     col12
  X      ['A']
  Y      ['A', 'B', 'C']
  Z      ['C', 'A']

And another one that looks like:

 col21   col22
  'A'   'alpha'
  'B'   'beta'
  'C'   'gamma'

I would like to replace col12 base on col22 in a efficient way and get, as a result:

col31     col32
  X      ['alpha']
  Y      ['alpha', 'beta', 'gamma']
  Z      ['gamma', 'alpha']

One solution is to use an indexed series as a mapper with a list comprehension:

import pandas as pd

df1 = pd.DataFrame({'col1': ['X', 'Y', 'Z'],
                    'col2': [['A'], ['A', 'B', 'C'], ['C', 'A']]})

df2 = pd.DataFrame({'col21': ['A', 'B', 'C'],
                    'col22': ['alpha', 'beta', 'gamma']})

s = df2.set_index('col21')['col22']

df1['col2'] = [list(map(s.get, i)) for i in df1['col2']]

Result:

  col1                  col2
0    X               [alpha]
1    Y  [alpha, beta, gamma]
2    Z        [gamma, alpha]

I'm not sure its the most efficient way but you can turn your DataFrame to a dict and then use apply to map the keys to the values:

Assuming your first DataFrame is df1 and the second is df2 :

df_dict = dict(zip(df2['col21'], df2['col22']))
df3 = pd.DataFrame({"31":df1['col11'], "32": df1['col12'].apply(lambda x: [df_dict[y] for y in x])})

or as @jezrael suggested with nested list comprehension:

df3 = pd.DataFrame({"31":df1['col11'], "32": [[df_dict[y] for y in x] for x in df1['col12']]})

note: df3 has a default index

  31                    32
0  X               [alpha]
1  Y  [alpha, beta, gamma]
2  Z        [gamma, alpha]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM