简体   繁体   中英

raise ValueError ValueError: Found array with 0 feature(s) (shape=(124, 0)) while a minimum of 1 is required

I am trying to apply a PCA (Principal component analysis) on a dataset with 124 rows and 13 features. I'm trying to see how many features to use (via Logistic Regression) to get the most accurate prediction, I have this code here:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/'
    'machine-learning-databases/wine/wine.data', header=None)

from sklearn.model_selection import train_test_split
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.3, stratify=y, random_state=0)
# standardize the features
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
# initializing the PCA transformer and
# logistic regression estimator:
pca = PCA() #prof recommends getting rid of m_components = 3 
lr = LogisticRegression()
# dimensionality reduction:
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

"""
rows = len(X_train_pca)
columns = len(X_train_pca[0])
print(rows)
print(columns)
"""

# fitting the logistic regression model on the reduced dataset:
for i in range(12):
    lr.fit(X_train_pca[:, :i], y_train)
    y_train_pca = lr.predict(X_train_pca[:, :i])
    print('Training accuracy:', lr.score(X_train_pca[:, :i], y_train))

I get the error message: raise ValueError("Found array with %d feature(s) (shape=%s) while" ValueError: Found array with 0 feature(s) (shape=(124, 0)) while a minimum of 1 is required.

To my understanding, the for loop range is correct at 12 because it will go through all 13 features (0 through 12) and I am trying to have the for loop go through all the features (go through logistic regression with one feature, then two, then 3.... on and on until all 13 features and then see what their accuracies are to see how many features works best).

To your error:

X_train_pca[:, :i] when i=0 will give you an empty array, which is invalid as an input of .fit() .

How to solve:

If you want to fit the model with only intercept, you can explicitly set fit_intercept=False in LogisticRegression() and add one extra column (to the leftmost) in your X filled with 1 (to act as the intercept).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM