[英]How to iterate over pandas dataframe columns and factorize based on a conditional?
I am trying to iterate over a Dataframe and conditionally factorize data. 我正在尝试遍历Dataframe并有条件地分解数据。 I have a Dataframe with information about house prices and instead of the data being represented by strings I would like them to be categories and represented by numbers (ie mansion = 0, house = 1).
我有一个包含房价信息的数据框,而不是用字符串表示的数据,我希望它们是类别并用数字表示(例如,豪宅= 0,房屋= 1)。 However, some columns are already integers or floats so I only want to categorize the columns that are strings.
但是,有些列已经是整数或浮点数,因此我只想对字符串列进行分类。
I am trying to factorize the data so I can use it with a keras sequential neural net without manually going through each column and factorizing myself. 我正在尝试分解数据,以便可以将其与keras顺序神经网络一起使用,而无需手动检查每一列并分解自己。
columns = list(dataframe)
for i in columns:
if type(i)==str:
xtrain.i = pd.Categorical(pd.factorize(dataframe.i)[0])
I thought this would factorize the data but I get the error 我以为这会分解数据,但出现错误
AttributeError: 'DataFrame' object has no attribute 'i'
and pandas does not recognize that I am attempting to refer to the column selection. AttributeError: 'DataFrame' object has no attribute 'i'
而pandas无法识别我正在尝试引用列选择。 for reference, the following piece of code works in the code. 作为参考,下面的代码在代码中起作用。 (MSZoning is a listed column)
(MSZoning是列出的列)
xtrain.MSZoning = pd.Categorical(pd.factorize(xtrain.MSZoning)[0])
Any help or advice would be much appreciated! 任何帮助或建议,将不胜感激!
This is more like 这更像
for i in columns:
if dataframe[i].dtypes=='object':
xtrain[i] = pd.Categorical(pd.factorize(dataframe[i])[0])
And since you are doing MlP, so let us using LabelEncoder
并且由于您正在执行MlP,所以让我们使用
LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
for i in columns:
if dataframe[i].dtypes=='object':
dataframe[i] = le.fit_transform(dataframe[i])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.