I have a dataset containing 15 examples. It has 3 features and a target label. How do I load the values corresponding to the 3 features into an array in Python (Pandas)?
I want to train a decision tree classifier on the dataset. For this, I have to load the examples into arrays such that all the data points are in an array X and the corresponding labels are in another array Y. How should I proceed?
The dataset looks like following:
x1 x2 x3 z
0 5.5 0.5 4.5 2
1 7.4 1.1 3.6 0
2 5.9 0.2 3.4 2
3 9.9 0.1 0.8 0
4 6.9 -0.1 0.6 2
5 6.8 -0.3 5.1 2
6 4.1 0.3 5.1 1
7 1.3 -0.2 1.8 1
8 4.5 0.4 2.0 0
9 0.5 0.0 2.3 1
10 5.9 -0.1 4.4 0
11 9.3 -0.2 3.2 0
12 1.0 0.1 2.8 1
13 0.4 0.1 4.3 1
14 2.7 -0.5 4.2 1
I already have the dataset loaded into a dataframe :
import pandas as pd
df = pd.read_csv('C:\Users\Dell\Downloads\dataset.csv')
print(df.to_string())
I need to know how to load the values corresponding to the features x1, x2 and x3 into X (as training examples) and the values corresponding to the label z into Y (as the labels for the training examples).
Thanks.
First you load the data in a data.frame.
Since you had a very strange formatting I changed this to normal .csv to make this example easier to understand.
x1,x2,x3,z
5.5,0.5,4.5,2
7.4,1.1,3.6,0
5.9,0.2,3.4,2
9.9,0.1,0.8,0
6.9,-0.1,0.6,2
6.8,-0.3,5.1,2
4.1,0.3,5.1,1
1.3,-0.2,1.8,1
4.5,0.4,2.0,0
0.5,0.0,2.3,1
5.9,-0.1,4.4,0
9.3,-0.2,3.2,0
1.0,0.1,2.8,1
0.4,0.1,4.3,1
2.7,-0.5,4.2,1
If you have the data in the data.frame, half of the work is already done. I posted you a example using the " caret " package using a linear regression model.
library("caret")
my.dataframe <- read.csv("myExample.csv", header = T, sep =",")
fit <- train(z ~ ., data = my.dataframe, method = "lm")
fit
Basically you have just to replace the "lm" in method to train all kinds of other models. Here is a list where you can choose from: http://topepo.github.io/caret/available-models.html
For training a random forest model you would type
library("caret")
my.dataframe <- read.csv("myExample.csv", header = T, sep =",")
fit <- train(z ~ ., data = my.dataframe, method = "rf")
fit
But be also careful you have very limited data - not every model makes sense for just 15 data points.
Random Forest model will give you for example this warning:
45: In randomForest.default(x, y, mtry = param$mtry, ...) : The response has five or fewer unique values. Are you sure you want to do regression?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.