简体   繁体   中英

ValueError: Unknown label type: 'continuous' in DecisionTreeClassifier()

I am trying to create a model which predicts results column below:

    Date    Open    High    Close   Result
1/22/2010   25.95   31.29   30.89   0.176104
2/19/2010   23.98   24.22   23.60   -0.343760
3/19/2010   21.46   23.16   22.50   0.124994
4/23/2010   21.32   21.77   21.06   -0.765601
5/21/2010   55.41   55.85   49.06   0.302556

The code I am using is:

import pandas
from sklearn.tree import DecisionTreeClassifier
dataset = pandas.read_csv('data.csv')
X = dataset.drop(columns=['Date','Result'])
y = dataset.drop(columns=['Date', 'Open', 'High', 'Close'])
model = DecisionTreeClassifier()
model.fit(X, y)

But I am getting an error:

ValueError: Unknown label type: 'continuous'

Suggestion for using other algorithms are also welcome.

In ML, it's important as a first step to consider the nature of your problem. Is it a regression or classification problem? Do you have target data (supervised learning) or is this a problem where you don't have a target and want to learn more about your data's inherent structure (such as unsupervised learning ). Then, consider what steps you need to take in your pipeline to prepare your data ( preprocessing ).

In this case, you are passing floats (floating point numbers) to a Classifier (DecisionTreeClassifier). The problem with this is that a classifier generally separates distinct classes, and so this classifier expects a string or an integer type to distinguish different classes from each other (this is known as the "target"). You can read more about this in an introduction to classifiers .

The problem you seek to solve is to determine a continuous numerical output, Result . This is known as a regression problem, and so you need to use a Regression algorithm (such as the DecisionTreeRegressor ). You can try other regression algorithms out once you have this simple one working, and this is a good place to start as it is a fairly straight forward one to understand, it is fairly transparent, it is fast, and easily implemented - so decision trees were a great choice of starting point!

As a further note, it is important to consider preprocessing your data. You have done some of this simply by separating your target from your input data:

X = dataset.drop(columns=['Date','Result'])
y = dataset.drop(columns=['Date', 'Open', 'High', 'Close'])

However, you may wish to look into preprocessing further, particularly standardisation of your data. This is often a required step for whichever ML algorithm you implement to be able to interpret your data. There's a saying that goes: "Garbage in, garbage out".

Part of preprocessing sometimes requires you to change the data type of a given column. The error posted in your question, at face value, leads one to think that the issue on hand is that you need to change data types. But, as explained, in the case of your problem, it wouldn't help to do that, given that you seek to use regression to determine a continuous output.

You are using DecisionTreeClassifier which is a classifier and will only predict categorical values such as 0 or 1 but your Result column is continuous so you should use DecisionTreeRegressor

Few suggestions

  1. You approach is a good try but I think it's not right approach.
  2. In ML modelling, there 3 main categories of models
    1. Regression: Have you head of Newton's laws? These are kind of ML Models that help identify the hidden rules & logics in data.
    2. Classification: These are type of ML models that are used to separate data into different categories.
    3. Time Series ML Models: This is like stock market data analytics. Unlike above, here on a date X the value depends on X-1, X-2, X-3 and so..on. This is some what closer to Regression but these requires model like ARIMA.

As for the error DecisionTreeClassifier is supposed to be used for identifying categories like 1, 2, 3, 4, .. so on but only for a limit set of classes.

For a series like your Results which is continuous and fractional series, you should a regression like models or ARIMA like time series ML Models.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM