简体   繁体   中英

How to get selected columns from SelectKBest() if ColumnTransformer transformer has not get_feature_names attribute?

I would like to know, which features got selected by using SelectKBest() , so I did first the ColumnTransformer() .

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler #need positive values for chi2 in SelectKBest()

num_features = [...]
cat_features = [...]

ct = ColumnTransformer([
    ("scaling", MinMaxScaler(), num_features),
    ("onehot", OneHotEncoder(sparse=False, handle_unknown='ignore'), cat_features)], 
    remainder='passthrough') #pass through

X_train_trans = ct.fit_transform(X_train)

And then the SelectKBest() :

from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2 

skb = SelectKBest(chi2, k=100)

X_train_trans_select = skb.fit_transform(X_train_trans, y_train)

I have trouble now understanding, which features got selected. I am aware of skb.get_support() and ct.get_feature_names() , but ct.get_feature_names() gives me

AttributeError: Transformer scaling (type MinMaxScaler) does not provide get_feature_names.

What could work for your case is to first store the column names in a list, checking if the transformer has the get_feature_names attribute then call it otherwise store the original column names.

import itertools

cols = [(transformer[1].get_feature_names() if getattr(transformer[1], "get_feature_names", None) else transformer[2]) 
        for transformer in ct.transformers_]

cols = list(itertools.chain(*cols))

then filter cols by the boolean index obtained from the get_support() method of SelecKBest

from itertools import compress

list(compress(cols, skb.get_support()))

Full Reproducible Example

import random
import itertools
import pandas as pd
from itertools import compress
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import MinMaxScaler


# First build some data with categorical and numerical features
data = load_iris()
X, y, feature_names = data['data'], data['target'], data['feature_names']
X = pd.DataFrame(X, columns=feature_names)
X['some_location'] = [random.choice(['NY', 'Texas', 'Boston']) for _ in range(X.shape[0])]

# Apply the column transformers
num_features = ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
cat_features = ['some_location']

ct = ColumnTransformer([
    ("scaling", MinMaxScaler(), num_features),
    ("onehot", OneHotEncoder(sparse=False, handle_unknown='ignore'), cat_features)], 
    remainder='passthrough') #pass through
X_train_trans = ct.fit_transform(X)

# Get the column names
cols = [(transformer[1].get_feature_names() if getattr(transformer[1], "get_feature_names", None) else transformer[2]) 
        for transformer in ct.transformers_]

cols = list(itertools.chain(*cols))
cols
>>>
['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)',
 'x0_Boston',
 'x0_NY',
 'x0_Texas']

# Apply SelectKBest
skb = SelectKBest(chi2, k=4)
X_train_trans_select = skb.fit_transform(X_train_trans, y)

# Get selected columns
list(compress(cols, skb.get_support()))
>>>
['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM