[英]Compare accuracy of RF model in python
我想計算其准確性(通過測試數據集)。 該模型具有以下預測值:
[0 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0
1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0]
我如何將其與實際值(在這種情況下為B或M)進行比較,以獲取測試數據的准確性。 這對於其他數據集值也應該是通用的。 這是我用於RandomForest模型的代碼:
import pandas as pd
import numpy as np
# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data'
dataset2 = pd.read_csv(file_path, header=None, sep=',')
train, test = train_test_split(dataset2, test_size=0.1)
y = pd.factorize(train[1])[0]
clf = RandomForestClassifier(n_jobs=2, random_state=0)
features = train.columns[2:]
clf.fit(train[features], y)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=10, n_jobs=2, oob_score=False, random_state=0,
verbose=0, warm_start=False)
# Apply the Classifier we trained to the test data
clf.predict(test[features])
您可以使用sklearn的preprocessing.LabelEncoder()
如下對B和M進行編碼,然后使用inverse_transform()
返回它。 此外,精度評定是可以做到用ConfusionMatrix()
的pandas_ml
包和sklearn的accuracy_score()
import pandas as pd
import numpy as np
# Load scikit's random forest classifier library
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
file_path = 'https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data'
dataset2 = pd.read_csv(file_path, header=None, sep=',')
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
# Encode B, M to 0, 1
y = le.fit_transform(dataset2[1])
dataset2[1] = y
train, test = train_test_split(dataset2, test_size=0.1)
y = train[1]
y_test = test[1]
clf = RandomForestClassifier(n_jobs=2, random_state=0)
features = train.columns[2:]
clf.fit(train[features], y)
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=None, max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=10, n_jobs=2, oob_score=False, random_state=0,
verbose=0, warm_start=False)
# Apply the Classifier we trained to the test data
y_pred = clf.predict(test[features])
# Decode from 0, 1 to B, M
y_test_label = le.inverse_transform(y_test)
y_pred_label = le.inverse_transform(y_pred)
from pandas_ml import ConfusionMatrix
confusion_matrix = ConfusionMatrix(y_test_label, y_pred_label)
print("Confusion matrix:\n%s" % confusion_matrix)
# Confusion matrix:
# Predicted B M __all__
# Actual
# B 35 1 36
# M 4 17 21
# __all__ 39 18 57
from sklearn.metrics import accuracy_score
accuracy_score(y_test_label, y_pred_label)
# Out[14]: 0.035087719298245612
請注意,可以通過pip輕松安裝pandas_ml
,如下所示。
pip install pandas_ml
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.