简体   繁体   中英

Sklearn predict python3.5

I am training a Logistic Regression model using sklearn LogisticRegression. I am getting a TypeError when trying to predict for the test set.

CODE:

test_features=test[["Sex","Age","Pclass","Fare","Embarked"]].values
myprediction=myfit2.predict(test_features)

ERROR:

float() argument must be a string or a number

I've checked the syntax a few times.Could this be because I'm using Python 3.5. Because this seems to work fine on python 2.7. Would greatly appreciate help to resolve this error.

The problem is that the data contain NaN:

Code:

import pandas as pd
from numpy import nanmean
import numpy as np
from sklearn.linear_model import LogisticRegression

train = pd.read_csv("train.csv") 
test = pd.read_csv("test.csv")

test["Sex"][test["Sex"]=="male"]=0
test["Sex"][test["Sex"]=="female"]=1
test["Embarked"][test["Embarked"]=='S']=0
test["Embarked"][test["Embarked"]=='C']=1
test["Embarked"][test["Embarked"]=='Q']=2

train["Sex"][train["Sex"]=="male"]=0
train["Sex"][train["Sex"]=="female"]=1
train["Embarked"][train["Embarked"]=='S']=0
train["Embarked"][train["Embarked"]=='C']=1
train["Embarked"][train["Embarked"]=='Q']=2

nan_mean_age = nanmean(test.iloc[:,4])
test = test.fillna(value = nan_mean_age)

nan_mean_age2 = nanmean(train.iloc[:,5])
train = train.fillna(value = nan_mean_age2)

train_features=train[["Sex","Age","Pclass","Fare","Embarked"]].values
test_features=test[["Sex","Age","Pclass","Fare","Embarked"]].values

train_features = np.asarray(train_features)
test_features = np.asarray(test_features)

lg = LogisticRegression()
#define your target variable y and then fit
y_train = train.iloc[:,1]

lg.fit(train_features,y_train)
lg.predict(test_features)

result:

array([0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
       1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1,
       0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
       0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1,
       1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
       1, 0, 0, 0], dtype=int64)

Something like this should work fine

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM