简体   繁体   English

从零开始的多类逻辑回归——Python

[英]Multiclass logistic regression from scratch -- Python

I've been looking for answers here but I couldn't find them.我一直在这里寻找答案,但我找不到它们。

I'm trying to apply multiclass logistic regression from scratch.我正在尝试从头开始应用多类逻辑回归。 The dataset is the MNIST.数据集是 MNIST。

I built some functions such as hypothesis, sigmoid, cost function, cost function derivate, and gradient descendent.我构建了一些函数,例如假设、sigmoid、成本 function、成本 function 导数和梯度下降。 My code is below.我的代码如下。

I'm struggling with:我正在努力:

As all images are labeled with the respective digit that they represent.因为所有图像都标有它们所代表的相应数字。 There are a total of 10 classes.一共有10个班。

Inside the function gradient descendent, I need to loop through each class, but I do not know how to apply it using the One vs All method.在 function 梯度后代内部,我需要遍历每个 class,但我不知道如何使用 One vs All 方法应用它。

In other words, what I need to do are:换句话说,我需要做的是:

  • How to filter each class inside the gradient descendent.如何过滤梯度下降中的每个 class。

  • After that, how to build a function to predict the test set.之后,如何构建一个 function 来预测测试集。

Here is my code.这是我的代码。

import numpy as np
import pandas as pd


# Only training data set
# the test data will be load later.

url='https://drive.google.com/file/d/1-MO8oCfq4KU361QeeL4DdafVBhZePUNT/view?usp=sharing'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
df = pd.read_csv(url,header = None)

X = df.values[:, 0:-1]
y = df.values[:, -1]

m = np.size(X, 0)

y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias


def hypothesis(X, thetas):
    return sigmoid( X.dot(thetas)) #- 0.0000001 

def sigmoid(z):
    return 1/(1+np.exp(-z))

def losscost(X, y, m, thetas):
    h = hypothesis(X, thetas)
    return -(1/m) * ( y.dot(np.log(h)) + (1-y).dot(np.log(1-h)) )


def derivativelosscost(X, y, m, thetas):
    h = hypothesis(X, thetas)  
    return (h-y).dot(X)/m

def descendinggradient(X, y, m, epoch, alpha, thetas):
    
    n = np.size(X, 1)
    J_historico = []

    for i in range(epoch):

        for j in range(0,10):  # 10 classes

            # How to filter each class inside here (inside this def descendinggradient)?
        
            # 2 lines below are wrong.
            #thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)
            #J_historico = J_historico + [losscost(X, y, m, thetas)]
    
    return [thetas, J_historico]


alpha = 0.01
epoch = 50
(thetas, J_historico) = descendinggradient(X, y, m, epoch, alpha)

# After that, how to build a function to predict the test set.

I appreciate any help.我很感激任何帮助。

Let me explain this problem step-by-step:让我一步一步解释这个问题:

First since you code doesn't provides the actual data or a link to it I've created a random dataset followed by the same commands you used to create X and Y :首先,由于您的代码未提供实际数据或指向它的链接,因此我创建了一个随机数据集,然后是您用于创建XY的相同命令:

batch_size = 20
num_classes = 10


rng = np.random.default_rng(seed=42)
df = pd.DataFrame(
    4* rng.random((batch_size, num_classes + 1)) - 2, # Create Random Array Between -2, 2
    columns=['X0','X1','X2','X3','X4','X5','X6','X7','X8', 'X9','Y']
)


X = df.values[:, 0:-1]
y = df.values[:, -1]

m = np.size(X, 0)

y = np.array(y).reshape(m, 1)
X = np.c_[ np.ones(m), X ] # Bias

Next lets take a look at your hypothesis function.接下来让我们看看您的假设 function。 If we would just run hypothesis and take a look at the first sample, we will get a vector with the size (10,1).如果我们只是运行假设并查看第一个样本,我们将得到一个大小为 (10,1) 的向量。 I also needed to provide the initial thetas for this case:我还需要为此案例提供初始 theta:

thetas = rng.random((X.shape[1],num_classes))

h = hypothesis(X, thetas)

print(h[0])

>>>[0.89701729 0.90050806 0.98358408 0.81786334 0.96636732 0.97819512
 0.89118488 0.87238045 0.70612173 0.30256924]

Basically the function calculates a "propabilties" [1] for each class.基本上,function 计算每个 class 的“概率” [1]

At this point we got to the first issue in your code.至此,我们遇到了代码中的第一个问题。 The result of the sigmoid function returns "propabilities" which are not "connected" to each other. sigmoid function 的结果返回彼此没有“连接”的“概率”。 So to set those "propabilties" in relation we need a another function: SOFTMAX .因此,要设置这些“属性”,我们需要另一个 function: SOFTMAX You will find plenty implementations about this functions.你会发现很多关于这个函数的实现。 In short: It will calculate the "propabilites" based on the "sigmoid", so that the sum overall class-"propabilites" results to 1.简而言之:它将根据“sigmoid”计算“propabilites”,使得总类-“propabilites”的总和为1。

So for your second question "How to implement a predict after training", we only need to find the argmax value to determine the class:所以对于你的第二个问题“How to implement a predict after training”,我们只需要找到 argmax 值即可确定 class:

h = hypothesis(X, thetas)
p = softmax(h) # needs to be implemented
prediction = np.argmax(p, axis=1)
print(prediction)

>>>[2 5 5 8 3 5 2 1 3 5 2 3 8 3 3 9 5 1 1 8]

Now that we know how to predict a class, we also need to know where to setup the training.既然我们知道如何预测 class,我们还需要知道在哪里设置训练。 We want to do this directly after the softmax function.我们想在 softmax function 之后直接执行此操作。 But instead of using the argmax to determine the winning class, we use the costfunction and its derivative.但我们没有使用 argmax 来确定获胜的 class,而是使用成本函数及其导数。 Your problem in your code: You used the crossentropy loss for a binary problem.您的代码中的问题:您将交叉熵损失用于二进制问题。 The binary problem also don't need to use the softmax function, because the sigmoid function already provides the connection of the two binary classes.二进制问题也不需要使用softmax function,因为sigmoid function 已经提供了两个二进制类的连接。 So since we are not interested in the result at all of the cross-entropy-loss for multiple classes and only into its derivative, we also want to calculate this directly.因此,由于我们对多个类的交叉熵损失的结果不感兴趣,只对它的导数感兴趣,我们也想直接计算它。

The conversion from binary crossentropy to multiclass is kind of unintuitive in the first view.在第一个视图中,从二元交叉熵到多类的转换有点不直观。 I recommend to read a bit about it before implementing.我建议在实施之前先阅读一下它。 After this you basicly use your line:在此之后,您基本上使用您的线路:

thetas = thetas - alpha * derivativelosscost(X, y, m, thetas)

for updating the thetas.用于更新 theta。

[1] These are not actuall propabilities, but this is a complete different topic. [1]这些并不是实际的可能性,但这是一个完全不同的主题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM