用于嵌套表的 Pandas Get_dummies

Question

I am looking to utilize pandas get_dummy() functionality to encode a (quite extensive) set of categorical variables.我希望利用 pandas get_dummy() 功能对一组（相当广泛的）分类变量进行编码。 However the data is currently in nested table format.但是，数据目前采用嵌套表格式。 Meaning that each row represents another variable instance for example这意味着每一行代表另一个变量实例，例如

Instance, Cat_Col
1, John
1, Smith
2, Jane
3, Joe

Now I can generate the full list of unique variables which I can use to get_dummies which represent all possible values.现在我可以生成唯一变量的完整列表，我可以使用它来获取代表所有可能值的 get_dummies。 However transforming the nested table into a single instance row in this new format is giving me some trouble.但是，以这种新格式将嵌套表转换为单个实例行给我带来了一些麻烦。

Any help is much appreciated Thanks非常感谢任何帮助谢谢

Edit: each instance should have a dummy coding result for all values of Cat_col编辑：对于 Cat_col 的所有值，每个实例都应该有一个虚拟编码结果

The idea would be the result be a single feature vector like so这个想法是结果是一个单一的特征向量，像这样

Instance,Col_John,Col_Smith,Col_Jane,Col_Joe
1,1,1,0,0
2,0,0,1,0
3,0,0,0,1

I believe that is the correct coding, assuming we are doing 1-hot encoding我相信这是正确的编码，假设我们正在做 1-hot 编码

Answer 1

You may want to consider using pivot_table to achieve your goal here.您可能需要考虑使用pivot_table来实现您的目标。

import pandas as pd

df

Out[10]: 
   Instance Cat_Col
0         1    John
1         1   Smith
2         2    Jane
3         3     Joe

df['count'] = 1
df.pivot('Instance', 'Cat_Col', 'count').fillna(0)

Out[11]: 
Cat_Col    Jane   Joe   John   Smith
Instance                            
1             0     0      1       1
2             1     0      0       0
3             0     1      0       0

If you prefer to use get_dummies ,如果您更喜欢使用get_dummies ，

result = pd.get_dummies(df.Cat_Col)
result['Instance'] = df.Instance
result = result.set_index('Instance')
result.groupby(level=0).apply(max)

Out[26]: 
           Jane   Joe   John   Smith
Instance                            
1             0     0      1       1
2             1     0      0       0
3             0     1      0       0

用于嵌套表的 Pandas Get_dummies

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-07-06 22:09:06

用于嵌套表的 Pandas Get_dummies

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-07-06 22:09:06

解决方案1
2 已采纳 2015-07-06 22:09:06