Python 中的神經網絡僅使用 numpy

Question

我正在嘗試編寫兩個神經網絡。 第一個網絡的架構由一個輸入層、一個隱藏層和一個 output 層組成。 輸入層是 R^2，因此它接受兩個輸入 (x1, x2)，隱藏層有兩個神經元，output 層有一個神經元。 所有神經元都使用整流線性單元 (ReLU) 激活 function。 第一個和第二個神經網絡之間的唯一區別是第二個在隱藏層中有四個神經元。 否則它們是相同的。

我完成了第一個網絡的代碼並且能夠運行 plot 結果。 我主要是想讓神經網絡學習如何在我的數據集中分離兩個集群。 我生成 2000 個點來形成一個集群，然后再生成 2000 個點來形成下一個集群。 神經網絡的 output 將理想地找到一個分離平面（實際上是多個平面）來分離兩個集群。 我已經設置了我的 plot 以在測試階段錯誤期間的錯誤小於 0.05 時工作。 我還應該解釋一下，我正在嘗試找到理想的學習率和訓練時期，所以我有幾個循環來迭代不同的學習率（alpha）和時期。

我的第一個網絡運行良好，但是當我出於某種原因添加 2 個神經元時，我的網絡錯誤和參數（權重和偏差）變得很不穩定。 我無法讓 4 個神經元網絡得到低於 0.4 的錯誤。 我認為這與誤差和權重有關。 我一直在使用打印語句運行網絡以查看權重發生了什么，並注意到它們沒有很好地更新，因為訓練期間的錯誤停留在 0 上，因此權重永遠不會更新，但我不能 100% 確定這總是發生。

如果有人知道為什么我的權重和錯誤沒有正確更新，我將不勝感激。 如果您運行代碼，您將看到 plot 兩個集群，神經網絡的 output 不會在集群之間創建彩色分隔。 工作的兩個神經元架構的代碼是相同的，只是從代碼中刪除了額外的 2 個神經元。

這是網絡的代碼：

import numpy as np
import random
import gc
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter


nData = 2000 #2000 points used on each cluster for 4000 points total
nTrain = 1000 #Used for training loop and to create clusters
nEpoch = 1 #Initial epoch value
nTest = 2000 #Used for testing loop
#alpha = 0.001

#Initializing 2D array for x which will carry the x1 and x2 values
#Also creating the radius and theta values for the cluster data
std = 0.5
x = np.zeros((2*nData,2))
t = np.zeros((2*nData))
r = np.random.normal(0,std,2*nData);
theta = 2*np.pi*np.random.rand(2*nData);

#w11f and w12f are used to plot the value of weights w11 and w12 as they update
w11f = np.zeros(nEpoch*nTrain)
w12f = np.zeros(nEpoch*nTrain)

#Creating cluster 1 and target data
h = -6 + 12*np.random.rand(nData)
v = 5 + (h**2)/6
x[0:nData,0] = h + r[0:nData]*np.cos(theta[0:nData])
x[0:nData,1] = v + r[0:nData]*np.sin(theta[0:nData])
t[0:nData] = 0

#Creating cluster 2 and target data
h = -5 + 10*np.random.rand(nData)
v = 10 + (h**2)/4
x[nData:2*nData,0] = h + r[nData:2*nData]*np.cos(theta[nData:2*nData])
x[nData:2*nData,1] = v + r[nData:2*nData]*np.sin(theta[nData:2*nData])
t[nData:2*nData] = 1

#Normalization
x[:,0] = 1 + 0.1*x[:,0]
x[:,1] = 1 + 0.1*x[:,1]

#Parameter Initialization
w11 = 0.5 - np.random.rand();
w12 = 0.5 - np.random.rand();
w21 = 0.5 - np.random.rand();
w22 = 0.5 - np.random.rand();
w31 = 0.5 - np.random.rand();
w32 = 0.5 - np.random.rand();
w41 = 0.5 - np.random.rand();
w42 = 0.5 - np.random.rand();
b4 = 0.5 - np.random.rand();
b3 = 0.5 - np.random.rand();
b2 = 0.5 - np.random.rand();
b1 = 0.5 - np.random.rand();
ww1 = 0.5 - np.random.rand();
ww2 = 0.5 - np.random.rand();
ww3 = 0.5 - np.random.rand();
ww4 = 0.5 - np.random.rand();
bb = 0.5 - np.random.rand();

#Creating a list from 0 to 3999
a = range(0,2*nData)
#Creating a 3D array (tensor) to store all the error values at the end of each 50 iteration loop
er_List = np.zeros((14,50,6))
#Creating the final array to store the counter of successful error. These are errors under 0.05 in value
#the rows represent the alpha values from 0.001 to 0.05 and the columns represent each epoch from 1 to 6. This way you can view the 2D array and see which alpha and epoch give the most successes for the lowest error.
nSuccess_Array = np.zeros((14,6))


#Part B - Creating nested loops to train for multiple alpha and epoch value
#pairs
#Training
for l in range(0,14): #loop for alpha values
    alpha = [0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009, 0.01, 0.02, 0.03, 0.04, 0.05]
    nEpoch=1
    for n in range(0,6): #loop for incrementing epoch values
        nSuccess = 0
        #Initialize these again so the size updates as the epoch changes
        w11f = np.zeros(nEpoch*nTrain)
        w12f = np.zeros(nEpoch*nTrain)
        for j in range(0,50):
            #Initialize the parameters again so they are random every 50 iterations (for each new epoch 
            value)
            w11 = 0.5 - np.random.rand();
            w12 = 0.5 - np.random.rand();
            w21 = 0.5 - np.random.rand();
            w22 = 0.5 - np.random.rand();
            w31 = 0.5 - np.random.rand();
            w32 = 0.5 - np.random.rand();
            w41 = 0.5 - np.random.rand();
            w42 = 0.5 - np.random.rand();
            b4 = 0.5 - np.random.rand();
            b3 = 0.5 - np.random.rand();
            b2 = 0.5 - np.random.rand();
            b1 = 0.5 - np.random.rand();
            ww1 = 0.5 - np.random.rand();
            ww2 = 0.5 - np.random.rand();
            ww3 = 0.5 - np.random.rand();
            ww4 = 0.5 - np.random.rand();
            bb = 0.5 - np.random.rand();
            
            sp = random.sample(a,nTrain + nTest)
            p = 0
            for epoch in range(0,nEpoch):
                for i in range(0,nTrain):
                    #Neuron dot product
                    y1 = b1 + w11*x[sp[i],0] + w12*x[sp[i],1]
                    y2 = b2 + w21*x[sp[i],0] + w22*x[sp[i],1]
                    y3 = b3 + w31*x[sp[i],0] + w32*x[sp[i],1]
                    y4 = b4 + w41*x[sp[i],0] + w42*x[sp[i],1]
                    #Neuron activation function ReLU
                    dxx1 = y1 > 0
                    xx1 = y1*dxx1
                    
                    dxx2 = y2 > 0
                    xx2 = y2*dxx2
                    
                    dxx3 = y3 > 0
                    xx3 = y3*dxx3
                    
                    dxx4 = y4 > 0
                    xx4 = y4*dxx4
                    #Output of neural network before activation function
                    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
                    yy = yy > 0 #activation function
                    e = t[sp[i]] - yy #error calculation
                    
                    #Updating parameters
                    ww1 = ww1 + alpha[l]*e*xx1
                    ww2 = ww2 + alpha[l]*e*xx2
                    ww3 = ww3 + alpha[l]*e*xx3
                    ww4 = ww4 + alpha[l]*e*xx4
                    
                    bb = bb + alpha[l]*e
                    
                    w11 = w11 + alpha[l]*e*ww1*dxx1*x[sp[i],0]
                    w12 = w12 + alpha[l]*e*ww1*dxx1*x[sp[i],1]
                    
                    w21 = w21 + alpha[l]*e*ww2*dxx2*x[sp[i],0]
                    w22 = w22 + alpha[l]*e*ww2*dxx2*x[sp[i],1]
                    
                    w31 = w31 + alpha[l]*e*ww3*dxx3*x[sp[i],0]
                    w32 = w32 + alpha[l]*e*ww3*dxx3*x[sp[i],1]
                    
                    w41 = w41 + alpha[l]*e*ww4*dxx4*x[sp[i],0]
                    w42 = w42 + alpha[l]*e*ww4*dxx4*x[sp[i],1]
                    
                    b1 = b1 + alpha[l]*e*ww1*dxx1
                    b2 = b2 + alpha[l]*e*ww2*dxx2
                    b3 = b3 + alpha[l]*e*ww3*dxx3
                    b4 = b4 + alpha[l]*e*ww4*dxx4
                    
                    w11f[p] = w11
                    w12f[p] = w12
                    p = p + 1
            er = 0
#Training
            for k in range(nTrain,nTrain + nTest):
                y1 = b1 + w11*x[sp[i],0] + w12*x[sp[i],1]
                y2 = b2 + w21*x[sp[i],0] + w22*x[sp[i],1]
                y3 = b3 + w31*x[sp[i],0] + w32*x[sp[i],1]
                y4 = b4 + w41*x[sp[i],0] + w42*x[sp[i],1]
                
                dxx1 = y1 > 0
                xx1 = y1*dxx1
                
                dxx2 = y2 > 0
                xx2 = y2*dxx2
                
                dxx3 = y3 > 0
                xx3 = y3*dxx3
                
                dxx4 = y4 > 0
                xx4 = y4*dxx4
                
                yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
                yy = yy > 0
                e = abs(t[sp[k]] - yy)
                er = er + e #Accumulates error
            er = er/nTest #Calculates average error
            er_List[l,j,n] = er
            
            if er_List[l,j,n] < 0.05:
                nSuccess = nSuccess + 1
        #Part C - Creating an Array that contains the success values of each
        #alpha and epoch value pair
        nSuccess_Array[l,n] = nSuccess #Array that contains the success
        
        if nEpoch < 6:
            nEpoch = nEpoch +1


print(er)

#Plotting

if er < 0.5:
    plt.figure(1)
    plt.scatter(x[0:nData,0],x[0:nData,1])
    plt.scatter(x[nData:2*nData,0],x[nData:2*nData,1])
    
    X = np.arange(0.25,1.75,0.02)
    Y = np.arange(1.25,2.75,0.02)
    X, Y = np.meshgrid(X,Y)
    
    y1 = b1 + w11*X + w12*Y
    y2 = b2 + w21*X + w22*Y
    y3 = b3 + w31*X + w32*Y
    y4 = b4 + w41*X + w42*Y
    
    dxx1 = y1 > 0
    xx1 = y1*dxx1
    
    dxx2 = y2 > 0
    xx2 = y2*dxx2
    
    dxx3 = y3 > 0
    xx3 = y3*dxx3    
    
    dxx4 = y4 > 0
    xx4 = y4*dxx4
    
    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
    Z = yy > 0
    plt.scatter(X,Y,c=Z+1,alpha=0.3)

    plt.figure(2)
    f=np.arange(0,nEpoch*nTrain,1)
    plt.plot(f,w11f)
    
    plt.figure(3)
    plt.plot(f,w12f)
    
    plt.figure(4)
    ax = plt.axes(projection='3d')
    ax.scatter(x[0:nData,0],x[0:nData,1],0,s=30)
    ax.scatter(x[nData:2*nData,0],x[nData:2*nData,1],1,s=30)
    
    #Plotting the separating planes
    X = np.arange(0.25,1.75,0.02)
    Y = np.arange(1.25,2.75,0.02)
    X, Y = np.meshgrid(X,Y)
    
    y1 = b1 + w11*X + w12*Y
    y2 = b2 + w21*X + w22*Y
    y3 = b3 + w31*X + w32*Y
    y4 = b4 + w41*X + w42*Y
    
    dxx1 = y1 > 0
    xx1 = y1*dxx1
    
    dxx2 = y2 > 0
    xx2 = y2*dxx2
    
    dxx3 = y3 > 0
    xx3 = y3*dxx3    
    
    dxx4 = y4 > 0
    xx4 = y4*dxx4
    
    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
    Z = yy > 0
    ax.plot_surface(X,Y,Z,rstride=1, cstride=1,cmap='viridis',alpha=0.5)
    
    plt.figure(5)
    ax = plt.axes(projection='3d')
    X = np.arange(0,5,0.02)
    Y = np.arange(0,5,0.02)
    X, Y = np.meshgrid(X,Y)
    
    y1 = b1 + w11*X + w12*Y
    y2 = b2 + w21*X + w22*Y
    y3 = b3 + w31*X + w32*Y
    y4 = b4 + w41*X + w42*Y
    
    dxx1 = y1 > 0
    xx1 = y1*dxx1
    
    dxx2 = y2 > 0
    xx2 = y2*dxx2
    
    dxx3 = y3 > 0
    xx3 = y3*dxx3    
    
    dxx4 = y4 > 0
    xx4 = y4*dxx4
    
    yy = bb + ww1*xx1 + ww2*xx2 + ww3*xx3 + ww4*xx4
    ax.plot_surface(X, Y, yy, rstride=1, cstride=1,cmap='viridis', edgecolor='none')

Answer 1

是的，您可以使用np.matmul ( a@b ) 並手動計算梯度。 查看 Fastai v3 課程，第 2 部分https://course.fast.ai/videos/?lesson=8 。 Jeremy Howard 操作 PyTorch 張量，但您也可以在 NumPy 中進行操作。

Python 中的神經網絡僅使用 numpy

問題描述

1 個解決方案

解決方案1
0 2020-07-06 23:08:25

Python 中的神經網絡僅使用 numpy

問題描述

1 個解決方案

解決方案1 0 2020-07-06 23:08:25

解決方案1
0 2020-07-06 23:08:25