简体   繁体   English

从 pandas 中的字符串列创建虚拟变量

[英]Creating dummy variables from a string column in pandas

So I have a pandas df as follows and my goal is to take the MATCHUP column and make it several more dummy columns.所以我有一个 pandas df 如下,我的目标是获取MATCHUP列并使其成为更多的虚拟列。

INDICATOR MATCHUP 
1         [   "APPLE",   "GRAPE" ]
1         [   "APPLE",   "GRAPE" ]
0         [   "GRAPE",   "BANANA" ]
0         [   "PEAR",   "ORANGE" ]
1         [   "ORANGE",   "APPLE" ]

Here's a dict of how it looks:这是它的外观:

{'INDICATOR': [1, 1, 0, 0, 1],
 'MATCHUP': ['[   "APPLE",   "GRAPE" ]',
  '[   "APPLE",   "GRAPE" ]',
  '[   "GRAPE",   "BANANA" ]',
  '[   "PEAR",   "ORANGE" ]',
  '[   "ORANGE",   "APPLE" ]']}

So given this df, I would like to create some dummy variables to identify if a value appears in the MATCHUP .因此,鉴于此 df,我想创建一些虚拟变量来确定MATCHUP中是否出现值。

Final outcome:最终结果:

INDICATOR MATCHUP                    APPLE GRAPE BANANA PEAR ORANGE
1         [   "APPLE",   "GRAPE" ]   1     1     0      0    0 
1         [   "APPLE",   "GRAPE" ]   1     1     0      0    0
0         [   "GRAPE",   "BANANA" ]  0     1     1      0    0
0         [   "PEAR",   "ORANGE" ]   0     0     0      1    1
1         [   "ORANGE",   "APPLE" ]  1     0     0      0    1

Is there a way to accomplish this using pandas?有没有办法使用熊猫来完成这个? I attempted to accomplish this using this but I think the spacing in the MATCHUP column make this method unviable.我尝试使用它来完成操作,但我认为MATCHUP列中的间距使此方法不可行。

Check explode with str.get_dummies使用str.get_dummies检查explode

import ast
df = df.join(df['MATCHUP'].map(ast.literal_eval).explode().str.get_dummies().groupby(level=0).sum())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM