如何使用 pandas.get_dummies() 为某些列创建虚拟对象

Question

df = pd.DataFrame({'A': ['x', 'y', 'x'], 'B': ['z', 'u', 'z'],
                  'C': ['1', '2', '3'],
                  'D':['j', 'l', 'j']})

I just want Column A and D to get dummies not for Column B. If I used pd.get_dummies(df) , all columns turned into dummies.我只是想让 A 列和 D 列得到假人而不是 B 列。如果我使用pd.get_dummies(df) ，所有列都变成了假人。

I want the final result containing all of columns , which means column C and column B exit,like 'A_x','A_y','B','C','D_j','D_l' .我想要包含所有列的最终结果，这意味着列 C 和列 B 退出，例如'A_x','A_y','B','C','D_j','D_l' 。

Answer 1

It can be done without concatenation, using get_dummies() with required parameters它可以在没有连接的情况下完成，使用带有所需参数的 get_dummies()

In [294]: pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])
Out[294]: 
   B  C  A_x  A_y  D_j  D_l
0  z  1  1.0  0.0  1.0  0.0
1  u  2  0.0  1.0  0.0  1.0
2  z  3  1.0  0.0  1.0  0.0

Answer 2

Adding to the above perfect answers, in case you have a big dataset with lots of attributes, if you don't want to specify by hand all of the dummies you want, you can do set differences:添加到上述完美答案中，如果您有一个包含大量属性的大数据集，如果您不想手动指定您想要的所有虚拟对象，您可以设置差异：

len(df.columns) = 50
non_dummy_cols = ['A','B','C'] 
# Takes all 47 other columns
dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)

Answer 3

Just select the two columns you want to .get_dummies() for - column names indicate source column and variable label represented as binary variable, and pd.concat() the original columns you want unchanged:只需选择您想要.get_dummies()的两列 - column名表示源列和变量标签表示为二进制变量，而pd.concat()表示您想要不变的原始列：

pd.concat([pd.get_dummies(df[['A', 'D']]), df[['B', 'C']]], axis=1)

   A_x  A_y  D_j  D_l  B  C
0  1.0  0.0  1.0  0.0  z  1
1  0.0  1.0  0.0  1.0  u  2
2  1.0  0.0  1.0  0.0  z  3

Answer 4

The other answers are great for the specific example in the OP其他答案对于 OP 中的特定示例非常有用
This answer is for cases where there may be many columns, and it's too cumbersome to type out all the column names此答案适用于可能有很多列的情况，并且键入所有列名太麻烦
This is a non-exhaustive solution to specifying many different columns to get_dummies while excluding some columns.这是为get_dummies指定许多不同列同时排除某些列的非详尽解决方案。
Using the built-in filter() function on df.columns is also an option.在df.columns上使用内置的filter()函数也是一种选择。
pd.get_dummies only works on columns with an object dtype when columns=None . pd.get_dummies仅在columns=None时适用于具有object dtype columns=None 。
- Another potential option is to set only columns to be transformed with the object dtype , and make sure the columns that shouldn't be transformed, are not object dtype .另一个可能的选择是仅设置要使用object dtype转换的列，并确保不应转换的列不是object dtype 。
Using set() , as shown in this answer , is yet another option.使用set() ，如本答案所示，是另一种选择。

import pandas as pd
import string  # for data
import numpy as np

# create test data
np.random.seed(15)
df = pd.DataFrame(np.random.randint(1, 4, size=(5, 10)), columns=list(string.ascii_uppercase[:10]))

# display(df)
   A  B  C  D  E  F  G  H  I  J
0  1  2  1  2  1  1  2  3  2  2
1  2  1  3  3  1  2  2  1  2  1
2  2  3  1  3  2  2  1  2  3  3
3  3  2  1  2  3  2  3  1  3  1
4  1  1  1  3  3  1  2  1  2  1

Option 1选项 1

If the excluded columns are fewer than the included columns, specify the columns to remove, and then use a list comprehension to remove them from the list being passed to the columns= parameter.如果排除的列少于包含的列，请指定要删除的列，然后使用列表推导将它们从传递给columns=参数的列表中删除。

# columns not to transform
not_cols = ['C', 'G']

# get dummies
df_dummies = pd.get_dummies(data=df, columns=[col for col in df.columns if col not in not_cols])

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

Option 2选项 2

If the columns to remove are at the beginning or end, slice df.columns如果要删除的列在开头或结尾，则切片df.columns

df_dummies = pd.get_dummies(data=df, columns=df.columns[2:])

   A  B  C_1  C_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  G_1  G_2  G_3  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    1    0    1    0    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  2  1    0    1    0    1    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0
2  2  3    1    0    0    1    0    1    0    0    1    1    0    0    0    1    0    0    1    0    0    1
3  3  2    1    0    1    0    0    0    1    0    1    0    0    1    1    0    0    0    1    1    0    0
4  1  1    1    0    0    1    0    0    1    1    0    0    1    0    1    0    0    1    0    1    0    0

Option 3选项 3

Specify slices and then concat the excluded columns to the dummies指定切片，然后将excluded列连接到虚拟对象
- Uses pd.concat , similar to this answer , but with more columns.使用pd.concat ，类似于这个答案，但有更多的列。
np.r_ translates slice objects to concatenate np.r_将切片对象转换为连接

slices = np.r_[slice(0, 2), slice(3, 6), slice(7, 10)]
excluded = [2, 6]

df_dummies = pd.concat([df.iloc[:, excluded], pd.get_dummies(data=df.iloc[:, slices].astype(object))], axis=1)

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

如何使用 pandas.get_dummies() 为某些列创建虚拟对象

问题描述

4 个解决方案

解决方案1
61 已采纳 2016-05-17 07:25:35

解决方案2
14 2018-03-01 14:45:02

解决方案3
4 2016-05-17 00:45:41

解决方案4
0 2021-04-23 00:03:36

Option 1选项 1

Option 2选项 2

Option 3选项 3

如何使用 pandas.get_dummies() 为某些列创建虚拟对象

问题描述

4 个解决方案

解决方案1 61 已采纳 2016-05-17 07:25:35

解决方案2 14 2018-03-01 14:45:02

解决方案3 4 2016-05-17 00:45:41

解决方案4 0 2021-04-23 00:03:36

Option 1选项 1

Option 2选项 2

Option 3选项 3

解决方案1
61 已采纳 2016-05-17 07:25:35

解决方案2
14 2018-03-01 14:45:02

解决方案3
4 2016-05-17 00:45:41

解决方案4
0 2021-04-23 00:03:36