简体   繁体   English

我的玩具 SVM 的 Python 版本有什么错误?

[英]What's the mistake in my Python version of a toy SVM?

I'm currently going through Andrej Karpathy's Hacker's guide to Neural Networks .我目前正在阅读 Andrej Karpathy 的Hacker's guide to Neural Networks In Chapter 2: Machine Learning, Binary Classification , he gives an example of a (very basic) SVM.第 2 章:机器学习,二元分类中,他给出了一个(非常基础的)SVM 的例子。 Here's Karpathy's code:这是 Karpathy 的代码:

var a = 1, b = -2, c = -1; // initial parameters
for(var iter = 0; iter < 400; iter++) {
  // pick a random data point
  var i = Math.floor(Math.random() * data.length);
  var x = data[i][0];
  var y = data[i][1];
  var label = labels[i];

// compute pull
  var score = a*x + b*y + c;
  var pull = 0.0;
  if(label === 1 && score < 1) pull = 1;
  if(label === -1 && score > -1) pull = -1;

// compute gradient and update parameters
  var step_size = 0.01;
  a += step_size * (x * pull - a); // -a is from the regularization
  b += step_size * (y * pull - b); // -b is from the regularization
  c += step_size * (1 * pull);
}

And the following is my version, in Python:以下是我的 Python 版本:

import numpy
import random

X = numpy.array([[1.2, 0.7],
                 [-0.3, 0.5],
                 [-3, -1],
                 [0.1, 1.0],
                 [3.0, 1.1],
                 [2.1, -3]])

labels = [1, -1, 1, -1, -1, 1]

a = 1
b = -2
c = -1

l = len(X)-1

steps = 400

for n in range(0, steps):
    i = random.randint(0, l)
    x = X[i][0]
    y = X[i][1]
    label = labels[i]

    if n == 0:
            for j in range(0, l+1):
                x = X[j][0]
                y = X[j][1]
                label = labels[j]
                score = a*x + b*y + c
                print x,",",y,"-->", label, "vs.", score

    score = a*x + b*y + c
    pull = 0.0
    if label == 1 and score < 1:
        pull = 1
    if label == -1 and score > -1:
        pull = -1

    step_size = 0.01
    a += step_size * (x * pull - a)
    b += step_size * (y * pull - b)
    c += step_size * (1 * pull)

    if n == steps-1:
        print ""
        for j in range(0, l+1):
            x = X[j][0]
            y = X[j][1]
            label = labels[j]
            score = a*x + b*y + c
            print x,",",y,"-->", label, "vs.", score

The problem is, that even after more than the suggested 400 iterations, for some of the vectors, the parameters don't yield the correct label.问题是,即使在超过建议的 400 次迭代之后,对于某些向量,参数也不会产生正确的标签。

Here's the output after 400 iterations:这是 400 次迭代后的输出:

1.2 , 0.7 --> 1 vs. -0.939483353298
-0.3 , 0.5 --> -1 vs. -0.589208602761
-3.0 , -1.0 --> 1 vs. 0.651953448705
0.1 , 1.0 --> -1 vs. -0.921882586141
3.0 , 1.1 --> -1 vs. -1.44552077331
2.1 , -3.0 --> 1 vs. 0.896623596303

The first value after the "-->" is the correct label, second value is the score, ie learned label. “-->”后的第一个值是正确的标签,第二个值是分数,即学习到的标签。

All vector/learned labels are correct (in the sense of being assigned a value with the correct sign), except for the first one.除了第一个标签外,所有向量/学习标签都是正确的(从被分配一个带有正确符号的值的意义上说)。

I'm not sure what the reason is for this: Did I make a mistake in my code?我不确定这是什么原因:我在代码中犯了错误吗? I checked it a few times, but didn't find anything.我检查了几次,但没有找到任何东西。 Or am I forgetting something Python specific here.或者我忘记了这里特定于 Python 的东西。 Or, finally, is there some ML related reason why the correct label isn't learned in this case.或者,最后,是否有一些与机器学习相关的原因,为什么在这种情况下没有学习到正确的标签。 Doubt it though, otherwise it doesn't make sense that Karpathy got the right results.不过要怀疑,否则 Karpathy 得到正确的结果是没有意义的。

Any comments or help in figuring it out much appreciated.任何评论或帮助弄清楚它都非常感谢。

I believe that I found the problem(s):我相信我发现了问题:

(A) Your data set has no linear cut. (A) 您的数据集没有线性切割。

(B) Karpathy's "Monte Carlo" gradient descent thrashes on such a data set. (B) Karpathy 的“Monte Carlo”梯度下降在这样的数据集上颠簸。

(C) You and Karpathy used different data. (C) 你和 Karpathy 使用了不同的数据。

DATA SETS
label   Karpathy's      yours
  1     [1.2, 0.7]      [1.2, 0.7]
 -1     [-0.3, -0.5]    [-0.3, 0.5]
  1     [3.0, 0.1]      [-3, -1]
 -1     [-0.1, -1.0]    [0.1, 1.0]
 -1     [-1.0, 1.1]     [3.0, 1.1]
  1     [2.1, -3]       [2.1, -3]

The data set you gave almost has a cut line (hyperplane) at roughly y = 1/3x + 1/2, but the three points closets to the line argue constantly about the division.您提供的数据集几乎在 y = 1/3x + 1/2 处有一条剖切线(超平面),但最靠近该线的三个点一直在争论除法问题。 As it turns out, the best divider is distinctly different, leaving [1.2, 0.7] seriously on the wrong side of the line, but quite unhappy about that.事实证明,最好的分隔符明显不同,将 [1.2, 0.7] 严重地留在了错误的一边,但对此非常不满意。

The original data has a neat cut line at roughly y = -3x + 1, which this algorithm roughly approximates with (rounded) 0.6x - 0.1y - 0.5原始数据在大约 y = -3x + 1 处有一条整齐的切割线,该算法大致近似为(四舍五入)0.6x - 0.1y - 0.5

Again, this algorithm is looking for a minimal-cost fit rather than the "pure" SVM of the widest separating channel.同样,该算法正在寻找最小成本的拟合,而不是最宽分离通道的“纯”SVM。 Even when there is a neat cut line, this algorithm does nto converge on it;即使整齐的切口线,该算法确实n要在其上收敛; rather, it hacks its way to the general vicinity of an acceptable solution.相反,它侵入了可接受解决方案的一般附近。

It chooses a random point.它选择一个随机点。 If the point is solidly classified -- the score is of the right sign with a magnitude > 1 -- nothing happens.如果该点是可靠分类的——分数是正确的,幅度 > 1——没有任何反应。 However, if the point is on the wrong side of the line, or even too close for comfort, then it pulls the parameters for more favourable treatment.但是,如果该点位于线的错误一侧,或者甚至太靠近以至于不舒服,那么它会拉动参数以获得更有利的治疗。

Unless the channel is 2 units wide, the points within the disputed territories will continue to take turns shoving it back and forth.除非通道宽度为 2 个单位,否则争议领土内的点将继续轮流来回推挤。 There is no convergence criterion ... and, indeed, no convergence guarantee after a certain point.没有收敛标准……事实上,在某一点之后没有收敛保证。

Look carefully at Karpathy's code: the main algorithm makes changes for data points with scores < 1 or > -1 (depending on training class).仔细查看 Karpathy 的代码:主要算法对分数 < 1 或 > -1 的数据点进行更改(取决于训练类别)。 However, the evaluation algorithm claims victory if the sign of the result is correct.然而,如果结果的符号是正确的,则评估算法声称胜利。 This is reasonable, but it isn't entirely consistent with the training function.这是合理的,但与训练功能并不完全一致。 In my trials, that first point always has a score with magnitude < 0.07, but the actual value waffles on either side of 0. The other points are well clear of 0, but only two of them pass 1. There are four points arguing over where the line should be.在我的试验中,第一点的得分总是小于 0.07,但实际值在 0 的任一侧摇摆不定。其他点完全没有 0,但只有两个点通过 1。有四点争论线应该在哪里。


Does this clear things up for you?这对你来说清楚了吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM