[英]Creating dummy variables from a string column in pandas
So I have a pandas df as follows and my goal is to take the MATCHUP
column and make it several more dummy columns.所以我有一个 pandas df 如下,我的目标是获取MATCHUP
列并使其成为更多的虚拟列。
INDICATOR MATCHUP
1 [ "APPLE", "GRAPE" ]
1 [ "APPLE", "GRAPE" ]
0 [ "GRAPE", "BANANA" ]
0 [ "PEAR", "ORANGE" ]
1 [ "ORANGE", "APPLE" ]
Here's a dict of how it looks:这是它的外观:
{'INDICATOR': [1, 1, 0, 0, 1],
'MATCHUP': ['[ "APPLE", "GRAPE" ]',
'[ "APPLE", "GRAPE" ]',
'[ "GRAPE", "BANANA" ]',
'[ "PEAR", "ORANGE" ]',
'[ "ORANGE", "APPLE" ]']}
So given this df, I would like to create some dummy variables to identify if a value appears in the MATCHUP
.因此,鉴于此 df,我想创建一些虚拟变量来确定MATCHUP
中是否出现值。
Final outcome:最终结果:
INDICATOR MATCHUP APPLE GRAPE BANANA PEAR ORANGE
1 [ "APPLE", "GRAPE" ] 1 1 0 0 0
1 [ "APPLE", "GRAPE" ] 1 1 0 0 0
0 [ "GRAPE", "BANANA" ] 0 1 1 0 0
0 [ "PEAR", "ORANGE" ] 0 0 0 1 1
1 [ "ORANGE", "APPLE" ] 1 0 0 0 1
Is there a way to accomplish this using pandas?有没有办法使用熊猫来完成这个? I attempted to accomplish this using this but I think the spacing in the MATCHUP
column make this method unviable.我尝试使用它来完成此操作,但我认为MATCHUP
列中的间距使此方法不可行。
Check explode
with str.get_dummies
使用str.get_dummies
检查explode
import ast
df = df.join(df['MATCHUP'].map(ast.literal_eval).explode().str.get_dummies().groupby(level=0).sum())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.