[英]How to specify which column to remove in get_dummies in pandas
I have a DataFrame column with 3 values - Bart, Peg, Human.我有一个包含 3 个值的 DataFrame 列 - Bart、Peg、Human。 I need to one-hot encode them such that Bart and Peg stay as columns and human is represented as 0 0.我需要对它们进行单热编码,使 Bart 和 Peg 保留为列,而人则表示为 0 0。
Xi | Architecture
0 | Bart
1 | Bart
2 | Peg
3 | Human
4 | Human
5 | Peg
..
.
I want to one-hot encode them so that Human is represented as 0 0:我想对它们进行单热编码,以便 Human 表示为 0 0:
Xi |Bart| Peg
0 | 1 | 0
1 | 1 | 0
2 | 0 | 1
3 | 0 | 0
4 | 0 | 0
5 | 0 | 1
But when I do:但是当我这样做时:
pd.get_dummies(df['Architecture'], drop_first = True)
it removes "Bart" and keeps the other 2. Is there a way to specify which column to remove?它删除“Bart”并保留其他 2。有没有办法指定要删除的列?
You could mask
it:你可以mask
它:
df = df[['Xi']].join(pd.get_dummies(df['Architecture'].mask(df['Architecture']=='Human')))
Output: Output:
Xi Bart Peg
0 0 1 0
1 1 1 0
2 2 0 1
3 3 0 0
4 4 0 0
5 5 0 1
IIUC, try use get_dummies then drop 'Human' column: IIUC,尝试使用 get_dummies 然后删除“人类”列:
df['Architecture'].str.get_dummies().drop('Human', axis=1)
Output: Output:
Bart Peg
0 1 0
1 1 0
2 0 1
3 0 0
4 0 0
5 0 1
It's dropping "Bart" because that's the "first" label it sees.它正在删除“Bart”,因为这是它看到的“第一个”label。 get_dummies
doesn't have a built in way to say "drop this column after". get_dummies
没有内置的方式说“在之后删除此列”。 It is annoying.这很烦人。 So you can do a few things:所以你可以做几件事:
get_dummies
so "Human" shows up first when you use drop first
在使用get_dummies
之前对数据集进行排序,以便在您首先使用drop first
显示“Human”
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.