简体   繁体   English

随机森林分类器的准确率不会高于 50%

[英]Random Forest Classifier accuracy doesn't get higher than 50%

I am very new to machine learning and I am trying to classify this UCI Heart Disease Dataset using sklearn's random forest classifier.我对机器学习非常陌生,我正在尝试使用 sklearn 的随机森林分类器对这个UCI 心脏病数据集进行分类。 My approach is very basic, and I wanted to ask how I could improve my accuracy with the algorithm (some tips, links, etc.).我的方法非常基本,我想问一下如何使用算法提高我的准确性(一些提示、链接等)。 My accuracy tops out at about 50% every time.我的准确率每次都在 50% 左右。 Here's my code:这是我的代码:

import pandas as pd
import numpy as np
import random as random
import sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

df = pd.read_excel('/Users/Mady/Documents/ClevelandData.xlsx')
df.replace('?', -99999, inplace=True)

labels = df.iloc[:,-1]
labels = labels.values

df.drop(df.columns[len(df.columns)-1], axis=1, inplace=True)
riskFactors = df.values

random.seed(123)
random.shuffle(labels)
random.seed(123)
random.shuffle(riskFactors)

labels_train = labels[:(int(len(labels) * 0.8))]
labels_test = labels[(int(len(labels) * 0.8)):]

riskFactors_train = riskFactors[:(int(len(riskFactors) * 0.8))]
riskFactors_test = riskFactors[(int(len(riskFactors) * 0.8)):]

model = RandomForestClassifier(n_estimators = 1000)
model.fit(riskFactors_train,labels_train)
predicted_labels = model.predict(riskFactors_test)
acc = accuracy_score(labels_test,predicted_labels)
print(acc)

Solved this by removing the random part as there must have been some error there.通过删除随机部分解决了这个问题,因为那里一定有一些错误。 As suggested by Yulin Zhang, I used the train_test_split provided by sklearn.正如 Yulin Zhang 所建议的,我使用了train_test_split提供的 train_test_split。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM