[英]Efficient way to replace column of lists by matches with another data frame in Pandas
I have a pandas data frame that looks like: 我有一个熊猫数据框,看起来像:
col11 col12
X ['A']
Y ['A', 'B', 'C']
Z ['C', 'A']
And another one that looks like: 另一个看起来像:
col21 col22
'A' 'alpha'
'B' 'beta'
'C' 'gamma'
I would like to replace col12
base on col22
in a efficient way and get, as a result: 我想更换col12
在基地col22
在一个有效的方式,并得到,因此:
col31 col32
X ['alpha']
Y ['alpha', 'beta', 'gamma']
Z ['gamma', 'alpha']
One solution is to use an indexed series as a mapper with a list comprehension: 一种解决方案是将索引序列用作具有列表理解的映射器:
import pandas as pd
df1 = pd.DataFrame({'col1': ['X', 'Y', 'Z'],
'col2': [['A'], ['A', 'B', 'C'], ['C', 'A']]})
df2 = pd.DataFrame({'col21': ['A', 'B', 'C'],
'col22': ['alpha', 'beta', 'gamma']})
s = df2.set_index('col21')['col22']
df1['col2'] = [list(map(s.get, i)) for i in df1['col2']]
Result: 结果:
col1 col2
0 X [alpha]
1 Y [alpha, beta, gamma]
2 Z [gamma, alpha]
I'm not sure its the most efficient way but you can turn your DataFrame
to a dict
and then use apply
to map the keys to the values: 我不确定这是最有效的方法,但是您可以将DataFrame
转换为dict
,然后使用apply
将键映射到值:
Assuming your first DataFrame
is df1
and the second is df2
: 假设您的第一个DataFrame
是df1
,第二个是df2
:
df_dict = dict(zip(df2['col21'], df2['col22']))
df3 = pd.DataFrame({"31":df1['col11'], "32": df1['col12'].apply(lambda x: [df_dict[y] for y in x])})
or as @jezrael suggested with nested list comprehension: 或@jezrael建议使用嵌套列表理解:
df3 = pd.DataFrame({"31":df1['col11'], "32": [[df_dict[y] for y in x] for x in df1['col12']]})
note: df3
has a default index 注意: df3
具有默认索引
31 32
0 X [alpha]
1 Y [alpha, beta, gamma]
2 Z [gamma, alpha]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.