逻辑回归-不能使用分类变量来训练我的 model

Question

I want to train my model using this categorical variables being lifequality my objective variable我想使用这个分类变量训练我的 model 作为我的目标变量

SelectedColumns=['workOrganiz' , 'education', 'maritalSt','jobType','ageGroup','workHoursPeriod','sex','lifequality']

I try to run a logistic regression like this我尝试像这样运行逻辑回归

dfML=df[SelectedColumns]
list_of_results=[]
#train and test set stratified
X=dfML.iloc[:,:-1]    #all features except last
y=dfML.iloc[:,-1]  #target last column

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=15,stratify=y)
clf=LogisticRegression()
lrm=clf.fit(X_train,y_train)
y_pred=lrm.predict(X_test)

but I get the following error但我收到以下错误

ValueError: could not convert string to float: 'Private'

What am I doing wrong?我究竟做错了什么？ Using dummies makes my model have a precision and accuracy of 100%使用假人使我的 model 的精确度和准确度达到 100%

dfML=df[SelectedColumns]
dfML=pd.get_dummies(dfML)

If I remove the dfml=df[SelectedColumns] the 100% doesn't happen如果我删除 dfml=df[SelectedColumns] 100% 不会发生

Answer 1

Regression algorithms can only use 'numbers' to calculate the categorical prediction.回归算法只能使用“数字”来计算分类预测。 You can tho make a work around and still use categorical variables as predictors.您可以变通并仍然使用分类变量作为预测变量。 There are different ways but a simple one is called 'Dummy Coding'.有不同的方法，但一种简单的方法称为“虚拟编码”。 You can use the functionality get_dummies() to change the categorical volumns into multiple 0 an 1 columns.您可以使用功能 get_dummies() 将分类卷更改为多个 0 和 1 列。 See https://www.geeksforgeeks.org/how-to-create-dummy-variables-in-python-with-pandas/amp/请参阅https://www.geeksforgeeks.org/how-to-create-dummy-variables-in-python-with-pandas/amp/

逻辑回归-不能使用分类变量来训练我的 model

问题描述

1 个解决方案

解决方案1
0 2022-04-21 20:12:30

逻辑回归-不能使用分类变量来训练我的 model

问题描述

1 个解决方案

解决方案1 0 2022-04-21 20:12:30

解决方案1
0 2022-04-21 20:12:30