简体   繁体   中英

Why am I getting perfect on my decision tree ML algorithm training?

I'm testing out a Decision Tree for the first time and am getting a perfect score for my algorithm's performance. This doesn't make sense because the dataset that I am using is AAPL stock price for a bunch of different variables which obviously the algorithm can't detect perfectly.

CSV:

Date,Open,High,Low,Close,Adj Close,Volume
2010-01-04,10430.6904296875,10604.9697265625,10430.6904296875,10583.9599609375,10583.9599609375,179780000
2010-01-05,10584.5595703125,10584.5595703125,10522.51953125,10572.01953125,10572.01953125,188540000

I think the reason it might not be working is because I am essentially just feeding in the answers when training the model and it is just regurgitating those when I try and score the model.

Code:

# Data Sorting
df = pd.read_csv('AAPL_test.csv')
df = df.drop('Date', axis=1)
df = df.dropna(axis='rows')
inputs = df.drop('Close', axis='columns')
target = df['Close']

print(inputs.dtypes)
print(target.dtypes)

# Changing dtypes
lab_enc = preprocessing.LabelEncoder()
target_encoded = lab_enc.fit_transform(target)

# Model
model = tree.DecisionTreeClassifier()
model.fit(inputs, target_encoded)

print(f'SCORE = {model.score(inputs, target_encoded)}')

I've also thought about randomizing the order of the CSV files, that could help but I'm not sure how I would do that. I could randomize the df at the top of the code but I'm pretty sure that, that would equally skew the results for both dataframes and therefore there would be no difference to what I am doing now. Otherwise, I could individually randmoize the datasets but I think that would mess with the model training or scoring because the test data won't be associated with the right data? I'm not too sure.

Most probably your model is overfitted. I think you did not split your dataset into two part: One is for training and the other is testing. Test data will help you to understand if your model overfit or underfit.

For more information:

Overfitting

How to Prevent Overfitting

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM