如何参考一个值使用 pandas 创建虚拟变量？

Question

test = {'ngrp' : ['Manhattan', 'Brooklyn', 'Queens', 'Staten Island', 'Bronx']}
test = pd.DataFrame(test)
dummy = pd.get_dummies(test['ngrp'], drop_first = True)

This gives me:这给了我：

   Brooklyn  Manhattan  Queens  Staten Island
0         0          1       0              0
1         1          0       0              0
2         0          0       1              0
3         0          0       0              1
4         0          0       0              0

I will get Bronx as my reference level (because that is what gets dropped), how do I change it to specify that Manhattan should be my reference level?我将 Bronx 作为我的参考水平（因为这是被丢弃的），我如何更改它以指定曼哈顿应该是我的参考水平？ My expected output is我预期的 output 是

   Brooklyn  Queens  Staten Island  Bronx
0         0       0              0      0
1         1       0              0      0
2         0       1              0      0
3         0       0              1      0
4         0       0              0      1

Answer 1

get_dummies sorts your values (lexicographically) and then creates dummies. get_dummies对您的值进行排序（按字典顺序），然后创建虚拟对象。 That's why you don't see "Bronx" in your initial result;这就是为什么您在初始结果中看不到“Bronx”的原因； its because it was the first sorted value in your column, so it was dropped first.它是因为它是您列中的第一个排序值，所以它首先被删除。

To avoid the behavior you see, enforce the ordering to be on a "first-seen" basis (ie, convert it to an ordered categorical).为避免您看到的行为，请强制以“先见”为基础进行排序（即，将其转换为有序的分类）。

pd.get_dummies(
    pd.Categorical(test['ngrp'], categories=test['ngrp'].unique(), ordered=True), 
    drop_first=True)                                       

   Brooklyn  Queens  Staten Island  Bronx
0         0       0              0      0
1         1       0              0      0
2         0       1              0      0
3         0       0              1      0
4         0       0              0      1

Of course, this has the side effect of returning dummies with categorical column names as the result, but that's almost never an issue.当然，这具有返回具有分类列名称的假人作为结果的副作用，但这几乎不是问题。

如何参考一个值使用 pandas 创建虚拟变量？

问题描述

1 个解决方案

解决方案1
2 2019-11-15 01:49:38

如何参考一个值使用 pandas 创建虚拟变量？

问题描述

1 个解决方案

解决方案1 2 2019-11-15 01:49:38

解决方案1
2 2019-11-15 01:49:38