简体   繁体   English

使用pandas.get_dummies

[英]Using pandas.get_dummies

So essentially I have a data frame with a bunch of columns, some of which I want to keep (stored in to_keep) and some other columns that I want to create categorical variables for using pandas.get_dummies (these are stored in to_change). 因此,从本质上讲,我有一个带有一堆列的数据框,其中一些我想保留(存储在to_keep中),另一些我想创建分类变量以使用pandas.get_dummies(这些存储在to_change中)。

However, I can't seem to get the syntax of how to do this down, and all the examples I have seen (ie here: http://blog.yhat.com/posts/logistic-regression-and-python.html ), don't seem to help. 但是,我似乎无法了解如何执行此操作的语法以及我所看到的所有示例(即,这里: http : //blog.yhat.com/posts/logistic-regression-and-python.html ),似乎没有帮助。

Here's what I have at present: 这是我目前拥有的:

new_df = df.copy()
dummies= pd.get_dummies(new_df[to_change])
new_df = new_df[to_keep].join(dummies)
return new_df

Any help on where I am going wrong would be appreciated, as the problem I keep running into is that this only adds categorical variables for the first column in to_change. 对于我要去哪里的任何帮助,将不胜感激,因为我一直遇到的问题是,这只会为to_change的第一列添加分类变量。

Didn't understand the problem completely, I must say. 我必须说,我不完全理解问题。

However, say your DataFrae is df , and you have a list of columns to_make_categorical . 但是,假设您的DataFrae是df ,并且有to_make_categorical列的列表。

The DataFrame with the non-categorical columns, is 具有非分类列的DataFrame是

wo_categoricals = df[[c for c in list(df.columns) if c not in to_make_categorical]]

The DataFrames of the categorical expansions are 类别扩展的DataFrames是

categoricals = [pd.get_dummies(df[c], prefix=c) for c in to_make_categorical]

Now you could just concat them horizontally: 现在,您可以水平连接它们:

pd.concat([wo_categoricals] + categoricals, axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM