简体   繁体   English

通过与熊猫中的另一个数据框匹配替换列表列的有效方法

[英]Efficient way to replace column of lists by matches with another data frame in Pandas

I have a pandas data frame that looks like: 我有一个熊猫数据框,看起来像:

col11     col12
  X      ['A']
  Y      ['A', 'B', 'C']
  Z      ['C', 'A']

And another one that looks like: 另一个看起来像:

 col21   col22
  'A'   'alpha'
  'B'   'beta'
  'C'   'gamma'

I would like to replace col12 base on col22 in a efficient way and get, as a result: 我想更换col12在基地col22在一个有效的方式,并得到,因此:

col31     col32
  X      ['alpha']
  Y      ['alpha', 'beta', 'gamma']
  Z      ['gamma', 'alpha']

One solution is to use an indexed series as a mapper with a list comprehension: 一种解决方案是将索引序列用作具有列表理解的映射器:

import pandas as pd

df1 = pd.DataFrame({'col1': ['X', 'Y', 'Z'],
                    'col2': [['A'], ['A', 'B', 'C'], ['C', 'A']]})

df2 = pd.DataFrame({'col21': ['A', 'B', 'C'],
                    'col22': ['alpha', 'beta', 'gamma']})

s = df2.set_index('col21')['col22']

df1['col2'] = [list(map(s.get, i)) for i in df1['col2']]

Result: 结果:

  col1                  col2
0    X               [alpha]
1    Y  [alpha, beta, gamma]
2    Z        [gamma, alpha]

I'm not sure its the most efficient way but you can turn your DataFrame to a dict and then use apply to map the keys to the values: 我不确定这是最有效的方法,但是您可以将DataFrame转换为dict ,然后使用apply将键映射到值:

Assuming your first DataFrame is df1 and the second is df2 : 假设您的第一个DataFramedf1 ,第二个是df2

df_dict = dict(zip(df2['col21'], df2['col22']))
df3 = pd.DataFrame({"31":df1['col11'], "32": df1['col12'].apply(lambda x: [df_dict[y] for y in x])})

or as @jezrael suggested with nested list comprehension: 或@jezrael建议使用嵌套列表理解:

df3 = pd.DataFrame({"31":df1['col11'], "32": [[df_dict[y] for y in x] for x in df1['col12']]})

note: df3 has a default index 注意: df3具有默认索引

  31                    32
0  X               [alpha]
1  Y  [alpha, beta, gamma]
2  Z        [gamma, alpha]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python列表中的pandas数据框以更有效的方式 - pandas data frame from python lists in more efficient way 使用熊猫中的另一列替换一列中的值的有效方法 - Efficient way to replace values in one column using another column in pandas 在 1000 万个模式的 Pandas 数据框上执行 str.contains 并为每个模式获取匹配的有效方法 - Efficient way to do str.contains on pandas data frame for 10 million patterns and get matches for each 如果这些值的一部分在 pandas 的预定义列表中,则替换列中的值的有效方法 - Efficient way to replace values in column if part of those values are in predefined lists in pandas 大熊猫:根据索引和列将一个数据框的值替换为另一数据框的值 - Pandas: replace values of one data frame with values of another data frame based on index and column 操作 pandas 数据框列中的列表(例如,除以另一列) - Manipulate lists in a pandas data frame column (e.g. divide by another column) 如何通过从另一个更大的数据框中选择一些数据列表来有效地构建熊猫数据框(或字典)? - How to build a pandas dataframe (or dict) in an efficient way by selecting some lists of data from another bigger dataframe? 在另一个数据框中为每一行重复熊猫行的更有效方法? - A more efficient way of repeating pandas rows for each row in another data frame? 如何替换熊猫数据框列中的值? - How to replace values in a pandas data frame column? Python - 替换熊猫数据框列中的值 - Python - replace values in a pandas data frame column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM