[英]Using pandas.get_dummies
So essentially I have a data frame with a bunch of columns, some of which I want to keep (stored in to_keep) and some other columns that I want to create categorical variables for using pandas.get_dummies (these are stored in to_change). 因此,从本质上讲,我有一个带有一堆列的数据框,其中一些我想保留(存储在to_keep中),另一些我想创建分类变量以使用pandas.get_dummies(这些存储在to_change中)。
However, I can't seem to get the syntax of how to do this down, and all the examples I have seen (ie here: http://blog.yhat.com/posts/logistic-regression-and-python.html ), don't seem to help. 但是,我似乎无法了解如何执行此操作的语法以及我所看到的所有示例(即,这里: http : //blog.yhat.com/posts/logistic-regression-and-python.html ),似乎没有帮助。
Here's what I have at present: 这是我目前拥有的:
new_df = df.copy()
dummies= pd.get_dummies(new_df[to_change])
new_df = new_df[to_keep].join(dummies)
return new_df
Any help on where I am going wrong would be appreciated, as the problem I keep running into is that this only adds categorical variables for the first column in to_change. 对于我要去哪里的任何帮助,将不胜感激,因为我一直遇到的问题是,这只会为to_change的第一列添加分类变量。
Didn't understand the problem completely, I must say. 我必须说,我不完全理解问题。
However, say your DataFrae is df
, and you have a list of columns to_make_categorical
. 但是,假设您的DataFrae是
df
,并且有to_make_categorical
列的列表。
The DataFrame with the non-categorical columns, is 具有非分类列的DataFrame是
wo_categoricals = df[[c for c in list(df.columns) if c not in to_make_categorical]]
The DataFrames of the categorical expansions are 类别扩展的DataFrames是
categoricals = [pd.get_dummies(df[c], prefix=c) for c in to_make_categorical]
Now you could just concat them horizontally: 现在,您可以水平连接它们:
pd.concat([wo_categoricals] + categoricals, axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.