[英]Sklearn predict python3.5
我正在使用sklearn LogisticRegression訓練Logistic回歸模型。 嘗試預測測試集時出現TypeError錯誤。
碼:
test_features=test[["Sex","Age","Pclass","Fare","Embarked"]].values
myprediction=myfit2.predict(test_features)
錯誤:
float()參數必須是字符串或數字
我已經檢查過幾次語法,可能是因為我使用的是Python 3.5。 因為這似乎在python 2.7上正常工作。 非常感謝您幫助解決此錯誤。
問題在於數據包含NaN:
碼:
import pandas as pd
from numpy import nanmean
import numpy as np
from sklearn.linear_model import LogisticRegression
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")
test["Sex"][test["Sex"]=="male"]=0
test["Sex"][test["Sex"]=="female"]=1
test["Embarked"][test["Embarked"]=='S']=0
test["Embarked"][test["Embarked"]=='C']=1
test["Embarked"][test["Embarked"]=='Q']=2
train["Sex"][train["Sex"]=="male"]=0
train["Sex"][train["Sex"]=="female"]=1
train["Embarked"][train["Embarked"]=='S']=0
train["Embarked"][train["Embarked"]=='C']=1
train["Embarked"][train["Embarked"]=='Q']=2
nan_mean_age = nanmean(test.iloc[:,4])
test = test.fillna(value = nan_mean_age)
nan_mean_age2 = nanmean(train.iloc[:,5])
train = train.fillna(value = nan_mean_age2)
train_features=train[["Sex","Age","Pclass","Fare","Embarked"]].values
test_features=test[["Sex","Age","Pclass","Fare","Embarked"]].values
train_features = np.asarray(train_features)
test_features = np.asarray(test_features)
lg = LogisticRegression()
#define your target variable y and then fit
y_train = train.iloc[:,1]
lg.fit(train_features,y_train)
lg.predict(test_features)
結果:
array([0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1,
1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0,
0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1,
1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0,
1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1,
0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0,
1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1,
0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1,
0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1,
1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0,
1, 0, 0, 0], dtype=int64)
這樣的事情應該可以正常工作
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.