简体   繁体   English

如何提高Sklearn PCA的Logistic回归分数?

[英]How to increase Logistic Regression score on Sklearn PCA?

I want to make a comparison between Lenet and PCA on regards to image recognition, so I used the German Traffic Signals Benchmark and the Sklearn PCA module, but when I tested it using Logistic Regression, the score didn't get higher than 6%, no matter what i tried. 我想在Lenet和PCA之间对图像识别进行比较,所以我使用了德国交通信号基准和Sklearn PCA模块,但是当我使用Logistic回归测试它时,得分并没有高于6%,无论我怎样尝试。

I tried modifying the number of interations and the number of preprocesses (using normalizations and equalization), but it still didn't work 我尝试修改了交互次数和预处理次数(使用标准化和均衡),但它仍然无法正常工作

The files are loaded by Pickle by three archives: 这些文件由Pickle由三个档案加载:

train.p, with shape of (34799, 32, 32, 3)
test.p, with shape of (12630, 32, 32, 3)
valid.p, with shape of (4410, 32, 32, 3)

each of them with its labels, as written in y_train, y_test and y_valid. 每个都带有标签,如y_train,y_test和y_valid所示。 and this is the relevant part of the code: 这是代码的相关部分:

def gray_scale(image):
    """
    Convert images to gray scale.
        Parameters:
            image: An np.array compatible with plt.imshow.
    """
    return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)

def preprocess2(data):

    n_training = data.shape
    gray_images = np.zeros((n_training[0], n_training[1], n_training[2]))
    for i, img in enumerate(data):
        gray_images[i] = gray_scale(img)
    gray_images = gray_images[..., None]
    return gray_images

from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

pca = PCA(0.95)

X_train_preprocess = preprocess2(X_train)
#Removing one dimension (34799,32,32,1) to (34799,32,32)
X_train_preprocess = X_train_preprocess.reshape(34799,32,32)
nsamples, nx, ny = X_train_preprocess.shape
X_train_preprocess = X_train_preprocess.reshape((nsamples,nx*ny))

X_test_preprocess = preprocess2(X_test)
#Removing one dimension (34799,32,32,1) to (12630,32,32)
X_test_preprocess = X_test_preprocess.reshape(12630,32,32) 
n2samples, n2x, n2y = X_test_preprocess.shape
X_test_preprocess = X_test_preprocess.reshape((n2samples,n2x*n2y))

print(X_train_preprocess.shape)
pca.fit(X_train_preprocess)
print(pca.n_components_)
scaler = StandardScaler()
scaler.fit(X_train_preprocess)
X_t_train = scaler.transform(X_train_preprocess)
X_t_test = scaler.transform(X_test_preprocess)

X_t_train = pca.transform(X_t_train)
X_t_test = pca.transform(X_t_test)

from sklearn.linear_model import LogisticRegression
logisticRegr = LogisticRegression(solver = 'lbfgs', max_iter = 5000)
logisticRegr.fit(X_t_train, y_train)
print('score', logisticRegr.predict(X_t_test[0:10]))
print('score', logisticRegr.score(X_t_test, y_test))

The results were these: 结果如下:

(34799, 1024)
62
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/logistic.py:469: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.
  "this warning.", FutureWarning)
score [ 1  2 10 10 13 10 25  1  1  4]
score 0.028820269200316707

So I want to see if you guys can enlighten me on what am I doing wrong and what can I do to make this work properly 所以我想看看你们是否可以告诉我我做错了什么,以及如何才能使这项工作正常进行

you got 2d data in image recognition , it's best to use cnn network to present the relation in high dimension 你在图像识别中得到了2d数据,最好用cnn网络来表示高维关系

related link : Training CNN with images in sklearn neural net 相关链接: 用sklearn神经网络中的图像训练CNN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM