使用SciKitLearn Logistic回归

Question

I have the following data which I'm trying to to do logistic regression on using SciKitLearn: 我有以下数据，我正在尝试使用SciKitLearn进行逻辑回归：

import numpy as np
from sklearn.linear_model import SGDClassifier
import matplotlib.pyplot as plt
X = np.array([(456.09677777777779, 477.87349999999998), (370.16702631578943, 471.41847368421048), (208.0453181818182, 96.825818181818164), (213.35274999999996, 509.25293750000003), (279.30812500000002, 155.14600000000002), (231.55695, 146.21420000000001), (285.93539285714286, 140.41428571428571), (297.28620000000001, 150.98409999999998), (267.3011923076923, 136.76630769230769), (226.57899999999998, 138.03450000000001), (312.01369230769228, 158.06576923076923), (305.04823076923083, 152.89192307692309), (225.434, 138.76300000000001), (396.39516666666663, 196.10216666666668), (239.16028571428572, 125.58142857142856), (235.898, 116.98099999999999), (132.98799999999997, 361.85599999999999), (120.1848, 391.27560000000005), (495.972375, 223.47975000000002), (485.80450000000002, 222.89939999999996), (257.07245454545449, 136.36881818181817), (441.60896153846159, 209.63723076923083), (451.61168749999996, 212.58543750000001), (458.90889285714286, 215.38342857142857), (474.8958235294117, 218.99223529411765), (467.85923529411775, 218.55094117647059), (251.96968421052637, 407.74273684210527), (181.53659999999999, 367.47239999999999), (356.85722222222222, 342.36394444444443), (234.99250000000001, 340.74079999999998), (211.58613157894737, 360.8791052631579), (207.18066666666667, 323.31349999999998), (320.41081249999996, 341.58249999999998), (316.88186842105262, 308.40215789473683), (285.2390666666667, 322.81979999999999), (300.14074999999997, 362.1682222222222), (279.99599999999998, 359.09577777777781)])
Y = np.array([1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0])

I use the following to try and classify the data: 我使用以下方法尝试对数据进行分类：

clf = SGDClassifier(loss="hinge", alpha=0.01, n_iter=200, fit_intercept=True)
clf.fit(X,Y)

The resulting hyperplane doesn't come anywhere close to separating this data. 产生的超平面并没有接近分离此数据的任何地方。 Any ideas on what is going on? 有什么想法吗？

Cheers, 干杯，

Greg 格雷格

PS the code I used to create the image is (just in case something is wrong there) PS我用于创建图像的代码是（以防万一那里有问题）

xx = np.linspace(0, 1000, 10)
yy = np.linspace(0, 600, 10)

X1, X2 = np.meshgrid(xx, yy)
Z = np.empty(X1.shape)

for (i, j), val in np.ndenumerate(X1):
   x1 = val
   x2 = X2[i, j]
   p = clf.decision_function([x1, x2])
   Z[i, j] = p[0]
plt.contour(X1, X2, Z, [0], colors="blue")

Answer 1

If you want to do logistic regression on using SGDClassifier: 如果要使用SGDClassifier进行逻辑回归：

use loss="log" , not loss="hinge" . 使用loss="log" ，而不是loss="hinge" 。 'hinge' gives a linear SVM, and 'log' gives logistic regression 'hinge'提供线性SVM，'log'提供logistic回归
try more iterations and different learning rate alpha. 尝试更多的迭代和不同的学习率alpha。

Actually, I recommend you to create an classifier using clf = LogisticRegression(C=1, class_weight='auto', penalty='l2') , see logistic regression , because SGDClassifier is based on gradient descent, where logistic regression is incrementally trained . 实际上，我建议您使用clf = LogisticRegression(C=1, class_weight='auto', penalty='l2')创建一个分类器，请参阅逻辑回归，因为SGDClassifier基于梯度下降，对Logistic回归进行增量训练。

使用SciKitLearn Logistic回归

问题描述

1 个解决方案

解决方案1
0 2014-11-28 11:22:29

使用SciKitLearn Logistic回归

问题描述

1 个解决方案

解决方案1 0 2014-11-28 11:22:29

解决方案1
0 2014-11-28 11:22:29