简体   繁体   中英

How to load a dataset's examples into different arrays for a decision tree classification?

I have a dataset containing 15 examples. It has 3 features and a target label. How do I load the values corresponding to the 3 features into an array in Python (Pandas)?

I want to train a decision tree classifier on the dataset. For this, I have to load the examples into arrays such that all the data points are in an array X and the corresponding labels are in another array Y. How should I proceed?

The dataset looks like following:

     x1   x2   x3   z
0   5.5  0.5  4.5   2
1   7.4  1.1  3.6   0
2   5.9  0.2  3.4   2
3   9.9  0.1  0.8   0
4   6.9 -0.1  0.6   2
5   6.8 -0.3  5.1   2
6   4.1  0.3  5.1   1
7   1.3 -0.2  1.8   1
8   4.5  0.4  2.0   0
9   0.5  0.0  2.3   1
10  5.9 -0.1  4.4   0
11  9.3 -0.2  3.2   0
12  1.0  0.1  2.8   1
13  0.4  0.1  4.3   1
14  2.7 -0.5  4.2   1

I already have the dataset loaded into a dataframe :

import pandas as pd
df = pd.read_csv('C:\Users\Dell\Downloads\dataset.csv')

print(df.to_string())

I need to know how to load the values corresponding to the features x1, x2 and x3 into X (as training examples) and the values corresponding to the label z into Y (as the labels for the training examples).

Thanks.

First you load the data in a data.frame.

Since you had a very strange formatting I changed this to normal .csv to make this example easier to understand.

x1,x2,x3,z
5.5,0.5,4.5,2
7.4,1.1,3.6,0
5.9,0.2,3.4,2
9.9,0.1,0.8,0
6.9,-0.1,0.6,2
6.8,-0.3,5.1,2
4.1,0.3,5.1,1
1.3,-0.2,1.8,1
4.5,0.4,2.0,0
0.5,0.0,2.3,1
5.9,-0.1,4.4,0
9.3,-0.2,3.2,0
1.0,0.1,2.8,1
0.4,0.1,4.3,1
2.7,-0.5,4.2,1

If you have the data in the data.frame, half of the work is already done. I posted you a example using the " caret " package using a linear regression model.

library("caret")
my.dataframe <- read.csv("myExample.csv", header = T, sep =",")
fit <-  train(z ~ .,  data = my.dataframe, method = "lm")
fit

Basically you have just to replace the "lm" in method to train all kinds of other models. Here is a list where you can choose from: http://topepo.github.io/caret/available-models.html

For training a random forest model you would type

library("caret")
my.dataframe <- read.csv("myExample.csv", header = T, sep =",")
fit <-  train(z ~ .,  data = my.dataframe, method = "rf")
fit

But be also careful you have very limited data - not every model makes sense for just 15 data points.

Random Forest model will give you for example this warning:

45: In randomForest.default(x, y, mtry = param$mtry, ...) : The response has five or fewer unique values. Are you sure you want to do regression?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM