[英]Python scikit learn decision tree
`
import pandas as pd
import numpy as np
from sklearn.tree import *
from sklearn.model_selection import *
from sklearn.metrics import *
from sklearn.preprocessing import LabelEncoder
from seaborn import *
import matplotlib.pyplot as plt
d = {"Outlook" : ["Sunny", "Sunny", "Overcast", "Rain", "Rain", "Rain", "Overcast", "Sunny", "Sunny", "Rain", "Sunny", "Overcast", "Overcast", "Rain"],
"Temperature" : ["Hot", "Hot", "Hot", "Mild", "Cool", "Cool", "Cool", "Mild", "Cool", "Mild", "Mild", "Mild", "Hot", "Mild"],
"Humidity" : ["High", "High", "High", "High", "Normal", "Normal", "Normal", "High", "Normal", "Normal", "Normal", "High", "Normal", "High"],
"Wind" : ["Weak", "Strong", "Weak", "Weak", "Weak", "Strong", "Strong", "Weak", "Weak", "Weak", "Strong", "Strong", "Weak", "Strong"],
"Played football(yes/no)" : ["No", "No", "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No"]}
dataframe = pd.DataFrame(d)
lb = LabelEncoder()
dataframe["Outlook"] = lb.fit_transform(dataframe["Outlook"])
lz = LabelEncoder()
dataframe["Temperature"] = lz.fit_transform(dataframe["Temperature"])
la = LabelEncoder()
dataframe["Humidity"] = la.fit_transform(dataframe["Humidity"])
lc = LabelEncoder()
dataframe["Wind"] = lc.fit_transform(dataframe["Wind"])
lh = LabelEncoder()
dataframe["Played football(yes/no)"] = lh.fit_transform(dataframe["Played football(yes/no)"])
x = dataframe[["Outlook", "Temperature", "Humidity", "Wind"]]
y = dataframe["Played football(yes/no)"]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
d = DecisionTreeClassifier()
d.fit(x_train, y_train)
y_pred = d.predict(x_test)
cf = confusion_matrix(y_test, y_pred)
z = plot_tree(d, filled=True, feature_names=x.columns)
plt.show()
`
从图中,不小于 0.5 的 outlook 将 go 向右,其他情况相同。 但是,由于不是数值数据,我应该如何计算 outlook 是否小于或大于 0.5?
转换后( label encoder
),所有特征都是数字的,所以决定将使用特征的这个数值(包括outlook
)
总而言之,当您使用决策树形式 sklearn 时,所有特征和拆分都将基于数值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.