[英]Predict multiple values as model result in scikit learn
I have created a model using scikit learn algorithm. 我已经使用scikit学习算法创建了一个模型。
rf = RandomForestClassifier(n_estimators = 10,random_state=seed)
rf.fit(X_train,Y_train)
shift_id=2099.0
user_id=1402.0
status=['S']
shift_organisation_id=15.0
shift_department_id=20.0
open_positions=71.0
city=['taunton']
role_id=3.0
specialty_id=16.0
years_of_experience=10.0
nurse_zip=2780.0
shifts_zip=2021.0
status = status_encoder.transform(status)
city = city_encoder.transform(city)
X = np.array([shift_id, user_id, status, shift_organisation_id, shift_department_id, open_positions, city, role_id, specialty_id, years_of_experience, nurse_zip, shifts_zip])
location_id = rf.predict(X.reshape(1,-1))
print(location_id)
which gives result like this 得到这样的结果
[25]
[25]
What I understand is 25
is the best prediction value for this model. 我了解到
25
是此模型的最佳预测值。 But I want to get top best 3 values as a result. 但我想获得最高的最佳3个值。 How can I get it?
我怎么才能得到它?
In that case prediction result would be like 在这种情况下,预测结果将是
[23,45,25]
[23,45,25]
You can you predict_proba
method to return class probabilities and get top 3 values from it ref 您可以将
predict_proba
方法返回类的概率,并从中获得前3名的值裁判
rf = RandomForestClassifier(n_estimators = 10,random_state=seed)
rf.fit(X_train,Y_train)
shift_id=2099.0
user_id=1402.0
status=['S']
shift_organisation_id=15.0
shift_department_id=20.0
open_positions=71.0
city=['taunton']
role_id=3.0
specialty_id=16.0
years_of_experience=10.0
nurse_zip=2780.0
shifts_zip=2021.0
status = status_encoder.transform(status)
city = city_encoder.transform(city)
X = np.array([shift_id, user_id, status, shift_organisation_id, shift_department_id, open_positions, city, role_id, specialty_id, years_of_experience, nurse_zip, shifts_zip])
location_id = rf.predict_proba(X.reshape(1,-1))
print(location_id)
You have the predict_proba
method for that, which returns the prediction of the class probabilities. 为此,您具有
predict_proba
方法,该方法返回类概率的预测。
Lets check on an example using the iris dataset : 让我们使用虹膜数据集检查示例:
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
# train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y)
rf = RandomForestClassifier(n_estimators = 10, random_state=10)
rf.fit(x_train,y_train)
If you now call the predict
method, as expected you get the highest probability class: 如果您现在按预期方式调用
predict
方法,则将获得最高的概率类别:
rf.predict(X_test)
# array([1, 2, 1, 0, 2, 0, 2, 0, 0, 1, 2, ...
However calling predict_proba
you will get the corresponding probabilities: 但是,调用
predict_proba
会得到相应的概率:
rf.predict_proba(X_test)
array([[0. , 1. , 0. ],
[0.11 , 0.1 , 0.79 ],
[0. , 0.7 , 0.3 ],
[0.5 , 0.4 , 0.1 ],
[0. , 0.3 , 0.7 ],
[0.5 , 0.2 , 0.3 ],
[0.4 , 0. , 0.6 ],
...
In order to get the highest k
probabilities you could use argsort
and index the corresponding probabilities rf.classes_
: 为了获得最高的
k
概率,您可以使用argsort
并索引相应的概率rf.classes_
:
k = 2
rf.classes_[rf.predict_proba(X_test).argsort()[:,-k:]]
array([[2, 1],
[0, 2],
[2, 1],
[1, 0],
[1, 2],
[2, 0],
[0, 2],
[1, 0],
[1, 0],
[2, 1],
...
In the above can be improved using argpartition
as wer'e only interested in the top k
probabilities: 在上面可以使用
argpartition
作为仅对前k
概率感兴趣的方法进行改进:
rf.classes_[rf.predict_proba(X_test).argpartition(range(k))[:,-k:]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.