pandas中get_dummies中如何指定删除哪一列

Question

I have a DataFrame column with 3 values - Bart, Peg, Human.我有一个包含 3 个值的 DataFrame 列 - Bart、Peg、Human。 I need to one-hot encode them such that Bart and Peg stay as columns and human is represented as 0 0.我需要对它们进行单热编码，使 Bart 和 Peg 保留为列，而人则表示为 0 0。

Xi | Architecture
0  | Bart
1  | Bart
2  | Peg
3  | Human
4  | Human
5  | Peg
..
.

I want to one-hot encode them so that Human is represented as 0 0:我想对它们进行单热编码，以便 Human 表示为 0 0：

Xi |Bart| Peg
0  | 1  | 0
1  | 1  | 0
2  | 0  | 1
3  | 0  | 0
4  | 0  | 0
5  | 0  | 1

But when I do:但是当我这样做时：

pd.get_dummies(df['Architecture'], drop_first = True)

it removes "Bart" and keeps the other 2. Is there a way to specify which column to remove?它删除“Bart”并保留其他 2。有没有办法指定要删除的列？

Answer 1

You could mask it:你可以mask它：

df = df[['Xi']].join(pd.get_dummies(df['Architecture'].mask(df['Architecture']=='Human')))

Output: Output：

   Xi  Bart  Peg
0   0     1    0
1   1     1    0
2   2     0    1
3   3     0    0
4   4     0    0
5   5     0    1

Answer 2

IIUC, try use get_dummies then drop 'Human' column: IIUC，尝试使用 get_dummies 然后删除“人类”列：

df['Architecture'].str.get_dummies().drop('Human', axis=1)

Output: Output：

   Bart  Peg
0     1    0
1     1    0
2     0    1
3     0    0
4     0    0
5     0    1

Answer 3

It's dropping "Bart" because that's the "first" label it sees.它正在删除“Bart”，因为这是它看到的“第一个”label。 get_dummies doesn't have a built in way to say "drop this column after". get_dummies没有内置的方式说“在之后删除此列”。 It is annoying.这很烦人。 So you can do a few things:所以你可以做几件事：

sort the dataset before using get_dummies so "Human" shows up first when you use drop first在使用get_dummies之前对数据集进行排序，以便在您首先使用drop first显示“Human”
subset the dataset to only one-hot-encode the columns where (architecture = "Bart" or "Peg")将数据集子集仅对列进行单热编码，其中（体系结构 =“Bart”或“Peg”）

pandas中get_dummies中如何指定删除哪一列

问题描述

3 个解决方案

解决方案1
2

解决方案2
0 已采纳 2022-03-07 17:31:26

解决方案3
0 2022-03-07 17:40:42

pandas中get_dummies中如何指定删除哪一列

问题描述

3 个解决方案

解决方案1 2

解决方案2 0 已采纳 2022-03-07 17:31:26

解决方案3 0 2022-03-07 17:40:42

解决方案1
2

解决方案2
0 已采纳 2022-03-07 17:31:26

解决方案3
0 2022-03-07 17:40:42