Random Forest Classifier accuracy doesn't get higher than 50%

Question

I am very new to machine learning and I am trying to classify this UCI Heart Disease Dataset using sklearn's random forest classifier. My approach is very basic, and I wanted to ask how I could improve my accuracy with the algorithm (some tips, links, etc.). My accuracy tops out at about 50% every time. Here's my code:

import pandas as pd
import numpy as np
import random as random
import sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

df = pd.read_excel('/Users/Mady/Documents/ClevelandData.xlsx')
df.replace('?', -99999, inplace=True)

labels = df.iloc[:,-1]
labels = labels.values

df.drop(df.columns[len(df.columns)-1], axis=1, inplace=True)
riskFactors = df.values

random.seed(123)
random.shuffle(labels)
random.seed(123)
random.shuffle(riskFactors)

labels_train = labels[:(int(len(labels) * 0.8))]
labels_test = labels[(int(len(labels) * 0.8)):]

riskFactors_train = riskFactors[:(int(len(riskFactors) * 0.8))]
riskFactors_test = riskFactors[(int(len(riskFactors) * 0.8)):]

model = RandomForestClassifier(n_estimators = 1000)
model.fit(riskFactors_train,labels_train)
predicted_labels = model.predict(riskFactors_test)
acc = accuracy_score(labels_test,predicted_labels)
print(acc)

Answer 1

Solved this by removing the random part as there must have been some error there. As suggested by Yulin Zhang, I used the train_test_split provided by sklearn.

Random Forest Classifier accuracy doesn't get higher than 50%

Question

1 answers

solution1
0 ACCPTED 2018-12-05 03:06:50

Random Forest Classifier accuracy doesn't get higher than 50%

Question

1 answers

solution1 0 ACCPTED 2018-12-05 03:06:50

solution1
0 ACCPTED 2018-12-05 03:06:50