[英]TypeError: __init__() got multiple values for argument 'n_splits' in the cancer dataset
Dataset数据集
Id,Cl.thickness,Cell.size,Cell.shape,Marg.adhesion,Epith.c.size,Bare.nuclei,Bl.cromatin,Normal.nucleoli,Mitoses,Class
1000025,5,1,1,1,2,1,3,1,1,benign
1002945,5,4,4,5,7,10,3,2,1,benign
Code is below代码如下
import math
import numpy as np
import pandas as pd
#from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import learning_curve,GridSearchCV
from sklearn.linear_model import LogisticRegressionCV
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_score, cross_val_predict, StratifiedKFold
from sklearn import preprocessing, metrics, svm, ensemble
from sklearn.metrics import accuracy_score, classification_report
import tabpy_client
# Breast Cancer dataset
# Citation: Dr. William H. Wolberg, University of Wisconsin Hospitals, Madison
# https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
# Read the dataset (Note that the CSV provided for this demo has rows with the missing data removed)
df = pd.read_csv('breastcancer.csv', header=0)
# Take a look at the structure of the file
df.head(n=4)
# Drop Id column not used in analysis
df.drop(['Id'], 1, inplace=True)
# Use LabelEncoder to convert textual classifications to numeric.
# We will use the same encoder later to convert them back.
encoder = preprocessing.LabelEncoder()
df['Class'] = encoder.fit_transform(df['Class'])
# You could also do this manually in the following way:
# df['Class'] = df['Class'].map( {'benign': 0, 'malignant': 1} ).astype(int)
# Check the result of the transform
df.head(n=6)
# Split columns into independent/predictor variables vs dependent/response/outcome variable
X = np.array(df.drop(['Class'], 1))
y = np.array(df['Class'])
# Scale the data. We will use the same scaler later for scoring function
scaler = preprocessing.StandardScaler().fit(X)
X = scaler.transform(X)
# 10 fold stratified cross validation
kf = StratifiedKFold(y,n_splits=10, random_state=None, shuffle=True)
# Define the parameter grid to use for tuning the Support Vector Machine
parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000]},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]
# Pick the goal you're optimizing for e.g. precision if you prefer fewer false-positives
# recall if you prefer fewer false-negatives. For demonstration purposes let's pick several
# Note that the final model selection will be based on the last item in the list
scoringmethods = ['f1','accuracy','precision', 'recall','roc_auc']
Why n_splits
is throwing error为什么
n_splits
抛出错误
TypeError: __init__() got multiple values for argument 'n_splits'.
n_splits
is the parameter in the gridsearch n_splits
是gridsearch中的参数
You don't pass data to sklearn model instances in the constructor.您不会在构造函数中将数据传递给 sklearn model 实例。 Here's the signature from https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html:
这是来自https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html 的签名:
StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)
You're getting that specific error because python interpreted the y array as the n_splits argument.您收到该特定错误是因为 python 将 y 数组解释为 n_splits 参数。 As for the splits, check out the methods in the docs.
至于拆分,请查看文档中的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.