簡體   English   中英

Scikit-learn 決策樹分類器

[英]Scikit-learn Decision Tree Classifier

我正在嘗試使用 scikit-learn 包構建樹分類器,但在獲取分類器輸入的正確格式時遇到問題。

import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

#import dataset
data = pd.read_table('Data/Breast.csv')
data.head(10)

在此處輸入圖像描述

X=數據[['clump_thickness','shape_uniformity','marginal_adhesion','上皮尺寸','bare_nucleoli','bland_chromatin','normal_nucleoli','有絲分裂']]

X_train = X.values

Y = data[['class']]
Y_train = Y.values

model = DecisionTreeClassifier()
model 

model.fit(X_train,Y_train)

但我收到以下錯誤消息:

ValueError                                Traceback (most recent call
last) <ipython-input-215-ffa49499a3bf> in <module>()
----> 1 model.fit(X_train,Y_train)

c:\users\tobias\appdata\local\programs\python\python36\lib\site-packages\sklearn\tree\tree.py
in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    788             sample_weight=sample_weight,
    789             check_input=check_input,
--> 790             X_idx_sorted=X_idx_sorted)
    791         return self
    792 

c:\users\tobias\appdata\local\programs\python\python36\lib\site-packages\sklearn\tree\tree.py
in fit(self, X, y, sample_weight, check_input, X_idx_sorted)
    114         random_state = check_random_state(self.random_state)
    115         if check_input:
--> 116             X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    117             y = check_array(y, ensure_2d=False, dtype=None)
    118             if issparse(X):

c:\users\tobias\appdata\local\programs\python\python36\lib\site-packages\sklearn\utils\validation.py
in check_array(array, accept_sparse, dtype, order, copy,
force_all_finite, ensure_2d, allow_nd, ensure_min_samples,
ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

ValueError: could not convert string to float: '?'

我究竟做錯了什么? 我可以看到 X.values 是 dType = Object ...

試試這個以確保您傳遞整數,如果您的集合包含字符串或分類值,或者顯示另一個問題,我將使用解決方案編輯此答案:

cols = ['clump_thickness','shape_uniformity','marginal_adhesion','epithelial_size','bare_nucleoli','bland_chromatin','normal_nucleoli','mitoses']
for col in cols:
     data[col] = data[col].astype('int') 
X.train = data[cols]
Y.train = data[['class]]

model = DecissionTreeClassifier()
model.fit(X_train,Y_train)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM