简体   繁体   中英

How to solving jupyter notebook python

I have used Jupyter notebook to learn about ML. I use e-book named " TensorFlow in 1 Day Make your own Neural Network ".

On page 219, "Step 3) Build the pipeline":

Please see below code How to solve it? I cannot pass this chapter 2 night please help me for solving this case.

import warnings
warnings.filterwarnings('ignore')
import numpy as np
import pandas as pd
import os, ssl
if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
getattr(ssl, '_create_unverified_context', None)):ssl._create_default_https_context = ssl._create_unverified_context

## Define path data
COLUMNS = ['age','workclass', 'fnlwgt', 'education','education_num', 'marital','occupation', 'relationship', 'race', 'sex','capital_gain', 'capital_loss','hours_week', 'native_country', 'label']
PATH = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
df_train = pd.read_csv(PATH,skipinitialspace=True,names = COLUMNS,index_col=False)

## List Categorical
CATE_FEATURES =df_train.iloc[:,:-1].select_dtypes('object').columns
print(CATE_FEATURES)
## List continuous
CONTI_FEATURES = df_train._get_numeric_data()

## Define path data
COLUMNS = ['age','workclass', 'fnlwgt', 'education','education_num', 'marital','occupation', 'relationship', 'race', 'sex','capital_gain', 'capital_loss','hours_week', 'native_country', 'label']

### Define continuous list
CONTI_FEATURES = ['age', 'fnlwgt','capital_gain', 'education_num','capital_loss', 'hours_week']

### Define categorical list
CATE_FEATURES = ['workclass', 'education', 'marital', 'occupation','relationship', 'race', 'sex', 'native_country']

## Prepare the data
features = ['age','workclass', 'fnlwgt', 'education','education_num', 'marital','occupation', 'relationship', 'race', 'sex','capital_gain', 'capital_loss','hours_week', 'native_country']
PATH = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
df_train = pd.read_csv(PATH, skipinitialspace=True, names =COLUMNS, index_col=False)
df_train[CONTI_FEATURES]=df_train[CONTI_FEATURES].astype('float64')

## Drop Netherland, because only one row
df_train = df_train[df_train.native_country != "HolandNetherlands"]

## Get the column index of the categorical features
conti_features = []
for i in CONTI_FEATURES:
    position = df_train.columns.get_loc(i)
    conti_features.append(position)

## Get the column index of the categorical features
categorical_features = []
for i in CATE_FEATURES:
    position = df_train.columns.get_loc(i)
    categorical_features.append(position)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test =train_test_split(df_train[features],df_train.label,test_size = 0.2,random_state=0)

from sklearn.preprocessing import StandardScaler, OneHotEncoder,LabelEncoder
from sklearn.compose import ColumnTransformer,make_column_transformer
from sklearn.pipeline import make_pipeline

preprocess = make_column_transformer((conti_features, StandardScaler()),(categorical_features, OneHotEncoder(sparse=False)))

preprocess.fit_transform(X_train).shape

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-47-a826a21eeb4b> in <module>()
----> 1 preprocess.fit_transform(X_train).shape

~\Anaconda3\envs\hello-tf\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
    512             self._feature_names_in = None
    513         X = _check_X(X)
--> 514         self._validate_transformers()
    515         self._validate_column_callables(X)
    516         self._validate_remainder(X)

~\Anaconda3\envs\hello-tf\lib\site-packages\sklearn\compose\_column_transformer.py in _validate_transformers(self)
    285                                 "transform, or can be 'drop' or 'passthrough' "
    286                                 "specifiers. '%s' (type %s) doesn't." %
--> 287                                 (t, type(t)))
    288 
    289     def _validate_column_callables(self, X):

TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. '[0, 2, 10, 4, 11, 12]' (type <class 'list'>) doesn't.

Right now I still finding how to solve it in Google but no one has the same problem as me.

You may have mixed up the order of the parameters to make_column_transformer . Here's an example from the docs :

>>> from sklearn.preprocessing import StandardScaler, OneHotEncoder
>>> from sklearn.compose import make_column_transformer
>>> make_column_transformer(
...     (StandardScaler(), ['numerical_column']),
...     (OneHotEncoder(), ['categorical_column']))
ColumnTransformer(transformers=[('standardscaler', StandardScaler(...),
                                 ['numerical_column']),
                                ('onehotencoder', OneHotEncoder(...),
                                 ['categorical_column'])])

See how the transformers came first, then the column list. The order is reversed in your code, and of course the column lists can't fit_transform .

Actually, looks like this order was deprecated and was just removed in May of last year, which just demonstrates the peril of learning an API from a book.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM