如何提高Sklearn PCA的Logistic回归分数？

Question

我想在Lenet和PCA之间对图像识别进行比较，所以我使用了德国交通信号基准和Sklearn PCA模块，但是当我使用Logistic回归测试它时，得分并没有高于6％，无论我怎样尝试。

我尝试修改了交互次数和预处理次数（使用标准化和均衡），但它仍然无法正常工作

这些文件由Pickle由三个档案加载：

train.p, with shape of (34799, 32, 32, 3)
test.p, with shape of (12630, 32, 32, 3)
valid.p, with shape of (4410, 32, 32, 3)

每个都带有标签，如y_train，y_test和y_valid所示。 这是代码的相关部分：

def gray_scale(image):
    """
    Convert images to gray scale.
        Parameters:
            image: An np.array compatible with plt.imshow.
    """
    return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

def preprocess2(data):

    n_training = data.shape
    gray_images = np.zeros((n_training[0], n_training[1], n_training[2]))
    for i, img in enumerate(data):
        gray_images[i] = gray_scale(img)
    gray_images = gray_images[..., None]
    return gray_images

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

pca = PCA(0.95)

X_train_preprocess = preprocess2(X_train)
#Removing one dimension (34799,32,32,1) to (34799,32,32)
X_train_preprocess = X_train_preprocess.reshape(34799,32,32)
nsamples, nx, ny = X_train_preprocess.shape
X_train_preprocess = X_train_preprocess.reshape((nsamples,nx*ny))

X_test_preprocess = preprocess2(X_test)
#Removing one dimension (34799,32,32,1) to (12630,32,32)
X_test_preprocess = X_test_preprocess.reshape(12630,32,32) 
n2samples, n2x, n2y = X_test_preprocess.shape
X_test_preprocess = X_test_preprocess.reshape((n2samples,n2x*n2y))

print(X_train_preprocess.shape)
pca.fit(X_train_preprocess)
print(pca.n_components_)
scaler = StandardScaler()
scaler.fit(X_train_preprocess)
X_t_train = scaler.transform(X_train_preprocess)
X_t_test = scaler.transform(X_test_preprocess)

X_t_train = pca.transform(X_t_train)
X_t_test = pca.transform(X_t_test)

from sklearn.linear_model import LogisticRegression
logisticRegr = LogisticRegression(solver = 'lbfgs', max_iter = 5000)
logisticRegr.fit(X_t_train, y_train)
print('score', logisticRegr.predict(X_t_test[0:10]))
print('score', logisticRegr.score(X_t_test, y_test))

结果如下：

(34799, 1024)
62
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/logistic.py:469: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
  "this warning.", FutureWarning)
score [ 1  2 10 10 13 10 25  1  1  4]
score 0.028820269200316707

所以我想看看你们是否可以告诉我我做错了什么，以及如何才能使这项工作正常进行

Answer 1

你在图像识别中得到了2d数据，最好用cnn网络来表示高维关系

相关链接：用sklearn神经网络中的图像训练CNN

如何提高Sklearn PCA的Logistic回归分数？

问题描述

1 个解决方案

解决方案1
0 2019-06-03 00:27:26

如何提高Sklearn PCA的Logistic回归分数？

问题描述

1 个解决方案

解决方案1 0 2019-06-03 00:27:26

解决方案1
0 2019-06-03 00:27:26