TypeError：init() 在癌症数据集中为参数“n_splits”获得了多个值

Question

Dataset数据集

Id,Cl.thickness,Cell.size,Cell.shape,Marg.adhesion,Epith.c.size,Bare.nuclei,Bl.cromatin,Normal.nucleoli,Mitoses,Class
1000025,5,1,1,1,2,1,3,1,1,benign
1002945,5,4,4,5,7,10,3,2,1,benign

Code is below代码如下

import math
import numpy as np
import pandas as pd
#from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import learning_curve,GridSearchCV
from sklearn.linear_model import LogisticRegressionCV
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import cross_val_score, cross_val_predict, StratifiedKFold 
from sklearn import preprocessing, metrics, svm, ensemble
from sklearn.metrics import accuracy_score, classification_report
import tabpy_client 
# Breast Cancer dataset
# Citation: Dr. William H. Wolberg, University of Wisconsin Hospitals, Madison 
# https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)

# Read the dataset (Note that the CSV provided for this demo has rows with the missing data removed)
df =  pd.read_csv('breastcancer.csv', header=0)

# Take a look at the structure of the file
df.head(n=4)
# Drop Id column not used in analysis
df.drop(['Id'], 1, inplace=True)

# Use LabelEncoder to convert textual classifications to numeric. 
# We will use the same encoder later to convert them back.
encoder = preprocessing.LabelEncoder()
df['Class'] = encoder.fit_transform(df['Class'])

# You could also do this manually in the following way:
# df['Class'] = df['Class'].map( {'benign': 0, 'malignant': 1} ).astype(int)

# Check the result of the transform
df.head(n=6)
# Split columns into independent/predictor variables vs dependent/response/outcome variable
X = np.array(df.drop(['Class'], 1))
y = np.array(df['Class'])

# Scale the data. We will use the same scaler later for scoring function
scaler = preprocessing.StandardScaler().fit(X)
X = scaler.transform(X)

# 10 fold stratified cross validation
kf = StratifiedKFold(y,n_splits=10, random_state=None, shuffle=True)

# Define the parameter grid to use for tuning the Support Vector Machine
parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

# Pick the goal you're optimizing for e.g. precision if you prefer fewer false-positives
# recall if you prefer fewer false-negatives. For demonstration purposes let's pick several
# Note that the final model selection will be based on the last item in the list
scoringmethods = ['f1','accuracy','precision', 'recall','roc_auc']

Why n_splits is throwing error为什么n_splits抛出错误

TypeError: __init__() got multiple values for argument 'n_splits'.

n_splits is the parameter in the gridsearch n_splits是gridsearch中的参数

Answer 1

You don't pass data to sklearn model instances in the constructor.您不会在构造函数中将数据传递给 sklearn model 实例。 Here's the signature from https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html:这是来自https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html 的签名：

StratifiedKFold(n_splits=5, *, shuffle=False, random_state=None)

You're getting that specific error because python interpreted the y array as the n_splits argument.您收到该特定错误是因为 python 将 y 数组解释为 n_splits 参数。 As for the splits, check out the methods in the docs.至于拆分，请查看文档中的方法。

TypeError：init() 在癌症数据集中为参数“n_splits”获得了多个值

问题描述

1 个解决方案

解决方案1
0 2020-06-17 15:00:42

TypeError：__init__() 在癌症数据集中为参数“n_splits”获得了多个值

问题描述

1 个解决方案

解决方案1 0 2020-06-17 15:00:42

TypeError：init() 在癌症数据集中为参数“n_splits”获得了多个值

解决方案1
0 2020-06-17 15:00:42