简体   繁体   English

如何使用 pandas.get_dummies() 为某些列创建虚拟对象

[英]How to create dummies for certain columns with pandas.get_dummies()

df = pd.DataFrame({'A': ['x', 'y', 'x'], 'B': ['z', 'u', 'z'],
                  'C': ['1', '2', '3'],
                  'D':['j', 'l', 'j']})

I just want Column A and D to get dummies not for Column B. If I used pd.get_dummies(df) , all columns turned into dummies.我只是想让 A 列和 D 列得到假人而不是 B 列。如果我使用pd.get_dummies(df) ,所有列都变成了假人。

I want the final result containing all of columns , which means column C and column B exit,like 'A_x','A_y','B','C','D_j','D_l' .我想要包含所有列的最终结果,这意味着列 C 和列 B 退出,例如'A_x','A_y','B','C','D_j','D_l'

It can be done without concatenation, using get_dummies() with required parameters它可以在没有连接的情况下完成,使用带有所需参数的 get_dummies()

In [294]: pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])
Out[294]: 
   B  C  A_x  A_y  D_j  D_l
0  z  1  1.0  0.0  1.0  0.0
1  u  2  0.0  1.0  0.0  1.0
2  z  3  1.0  0.0  1.0  0.0

Adding to the above perfect answers, in case you have a big dataset with lots of attributes, if you don't want to specify by hand all of the dummies you want, you can do set differences:添加到上述完美答案中,如果您有一个包含大量属性的大数据集,如果您不想手动指定您想要的所有虚拟对象,您可以设置差异:

len(df.columns) = 50
non_dummy_cols = ['A','B','C'] 
# Takes all 47 other columns
dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)

Just select the two columns you want to .get_dummies() for - column names indicate source column and variable label represented as binary variable, and pd.concat() the original columns you want unchanged:只需选择您想要.get_dummies()的两列 - column名表示源列和变量标签表示为二进制变量,而pd.concat()表示您想要不变的原始列:

pd.concat([pd.get_dummies(df[['A', 'D']]), df[['B', 'C']]], axis=1)

   A_x  A_y  D_j  D_l  B  C
0  1.0  0.0  1.0  0.0  z  1
1  0.0  1.0  0.0  1.0  u  2
2  1.0  0.0  1.0  0.0  z  3
  • The other answers are great for the specific example in the OP其他答案对于 OP 中的特定示例非常有用
  • This answer is for cases where there may be many columns, and it's too cumbersome to type out all the column names此答案适用于可能有很多列的情况,并且键入所有列名太麻烦
  • This is a non-exhaustive solution to specifying many different columns to get_dummies while excluding some columns.这是为get_dummies指定许多不同列同时排除某些列的非详尽解决方案。
  • Using the built-in filter() function on df.columns is also an option.df.columns上使用内置的filter()函数也是一种选择。
  • pd.get_dummies only works on columns with an object dtype when columns=None . pd.get_dummies仅在columns=None时适用于具有object dtype columns=None
    • Another potential option is to set only columns to be transformed with the object dtype , and make sure the columns that shouldn't be transformed, are not object dtype .另一个可能的选择是仅设置要使用object dtype转换的列,并确保不应转换的列不是object dtype
  • Using set() , as shown in this answer , is yet another option.使用set() ,如本答案所示,是另一种选择。
import pandas as pd
import string  # for data
import numpy as np

# create test data
np.random.seed(15)
df = pd.DataFrame(np.random.randint(1, 4, size=(5, 10)), columns=list(string.ascii_uppercase[:10]))

# display(df)
   A  B  C  D  E  F  G  H  I  J
0  1  2  1  2  1  1  2  3  2  2
1  2  1  3  3  1  2  2  1  2  1
2  2  3  1  3  2  2  1  2  3  3
3  3  2  1  2  3  2  3  1  3  1
4  1  1  1  3  3  1  2  1  2  1

Option 1选项 1

  • If the excluded columns are fewer than the included columns, specify the columns to remove, and then use a list comprehension to remove them from the list being passed to the columns= parameter.如果排除的列少于包含的列,请指定要删除的列,然后使用列表推导将它们从传递给columns=参数的列表中删除。
# columns not to transform
not_cols = ['C', 'G']

# get dummies
df_dummies = pd.get_dummies(data=df, columns=[col for col in df.columns if col not in not_cols])

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

Option 2选项 2

  • If the columns to remove are at the beginning or end, slice df.columns如果要删除的列在开头或结尾,则切片df.columns
df_dummies = pd.get_dummies(data=df, columns=df.columns[2:])

   A  B  C_1  C_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  G_1  G_2  G_3  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    1    0    1    0    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  2  1    0    1    0    1    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0
2  2  3    1    0    0    1    0    1    0    0    1    1    0    0    0    1    0    0    1    0    0    1
3  3  2    1    0    1    0    0    0    1    0    1    0    0    1    1    0    0    0    1    1    0    0
4  1  1    1    0    0    1    0    0    1    1    0    0    1    0    1    0    0    1    0    1    0    0

Option 3选项 3

  • Specify slices and then concat the excluded columns to the dummies指定切片,然后将excluded列连接到虚拟对象
    • Uses pd.concat , similar to this answer , but with more columns.使用pd.concat ,类似于这个答案,但有更多的列。
  • np.r_ translates slice objects to concatenate np.r_将切片对象转换为连接
slices = np.r_[slice(0, 2), slice(3, 6), slice(7, 10)]
excluded = [2, 6]

df_dummies = pd.concat([df.iloc[:, excluded], pd.get_dummies(data=df.iloc[:, slices].astype(object))], axis=1)

   C  G  A_1  A_2  A_3  B_1  B_2  B_3  D_2  D_3  E_1  E_2  E_3  F_1  F_2  H_1  H_2  H_3  I_2  I_3  J_1  J_2  J_3
0  1  2    1    0    0    0    1    0    1    0    1    0    0    1    0    0    0    1    1    0    0    1    0
1  3  2    0    1    0    1    0    0    0    1    1    0    0    0    1    1    0    0    1    0    1    0    0
2  1  1    0    1    0    0    0    1    0    1    0    1    0    0    1    0    1    0    0    1    0    0    1
3  1  3    0    0    1    0    1    0    1    0    0    0    1    0    1    1    0    0    0    1    1    0    0
4  1  2    1    0    0    1    0    0    0    1    0    0    1    1    0    1    0    0    1    0    1    0    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM