从零开始的神经网络 X = P(X)

Question

目标

对于课程要求，我需要创建一个 NN 来从平均值预测 (-2 Std, 2Std) 内正常随机变量的概率。

架构

我需要实现的架构由 2 个带有 ReLu 的隐藏层和带有 sigmoid 的最后一层组成。

1 个单元 - 输入层

64 个单元 - 隐藏层（ReLu）

1 个单元 - 输出层（Sigmoid）

但是，我似乎无法得到好的结果。 具体来说，计算的导数太小，我的权重/偏差没有改变。 我不知道是我生成的数据集错误还是我的反向传播实现错误。 在过去的两周里，我正试图解决这个问题，所以我的智慧已接近尾声。 这是我第一次深入机器学习，我希望得到一些指导。

这是我想出的：链接到完整的笔记本

下面是数据集生成的逻辑/伪代码。

数据集生成

首先，我生成正态随机变量并生成用于概率计算的直方图。
使用 bin 中心及其各自的频率作为起点，我在每个 bin 间隔内生成了更多数据。 因为此时我认为只有 bin 中心作为输入对于样本量来说太小了。 然后我使用 bin 中心概率（频率/n_samples）计算输出（P（X））。

编辑：添加了代码。

导入库：

import numpy as np
import matplotlib.pyplot as plt
from math import floor

一些辅助函数：

def scale(X):
  '''
  Normalizes X values for easier processing
  
  Parameters:
  X(numpy.array) - Raw input data to be normalized

  Returns:
  scaled_x(numpy.array) - Normalized input values
  mean_x(float) - mean of raw input data
  std_x(float) - standard deviation of raw input data
  '''
  mean_x = np.mean(X, axis=0)
  new = X - mean_x
  std_x = np.std(X, axis=0)
  scaled_x = new / std_x 
  return scaled_x, mean_x, std_x
def filter(input, output, n=10):
  '''
  Randomly selects n samples from the first row with a matching
  key from the second column.

  Parameters:
  input(numpy.array) - Input matrix to select samples
  output(int/float) - Value to match in the second column
  n(int, default = 10) - Number of samples to select

  Returns:
  new_x(numpy.array) - Row vector with size (n,)
  '''
  # Select all rows with the second column matching the output
  rows_with_output = input[(input[:, 1] == output)]
  # Get only the first column (X values)
  input_vals = rows_with_output[:,0]
  # Randomly pick 10 samples
  generator = np.random.default_rng(42)
  new_x = generator.choice(input_vals, n)
  return new_x

数据集生成

def random_range(lb, ub, n):
  '''Function to generate 'n' uniform random numbers in the interval (lb, ub]
  Parameters:
  lb - lower bound
  ub - upper bound
  n - number of random numbers to generate

  Returns:
  random_nums(numpy.array) - array of random numbers
  '''
  generator = np.random.default_rng(42)
  random_nums = generator.uniform(lb, ub, n)
  return random_nums

np.random.seed(42)
# Define mean and std
mu, sigma = 100, 15
num_bins = 100
raw_sample_size = 15000
raws = np.random.normal(mu, sigma, raw_sample_size).round(4)
hist, bin_edges = np.histogram(raws, bins=num_bins)
# Half bin width
bin_half_val = np.diff(bin_edges)/2
# Get the bin centers
bin_centers = bin_edges[:-1] + np.diff(bin_edges)/2
# Plot the 'discretized' distribution
plt.scatter(bin_centers, hist)

# Define bounds
lb = mu - 2*sigma
ub = mu + 2*sigma
# Select bin_centers inside the range [lb, ub]
bins = bin_centers[(bin_centers >= lb) & (bin_centers < ub)]
bin_idx = [np.where(bin_centers == bin) for bin in bins]

# Flatten index
bin_idx = np.array(bin_idx).flatten()
# Get frequency values for respective bin centers
hist_in_range = hist[bin_idx]

# Plot
plt.scatter(bins, hist_in_range)

# Define number of samples for each bin
num_data = 10

# For each bin generate 10 random numbers in the interval.
x_raw = np.array([random_range(x-bin_half_val[0], x+bin_half_val[0], num_data).reshape((num_data, 1)) for x in bins])
x_raw = x_raw.reshape(len(bins),1,num_data)

# Generate the probabilities.
y_raw = np.array([np.full((num_data,1), round(f/raw_sample_size, 4)) for f in hist_in_range])

# Flatten the arrays
y_f = flatten_stack(y_raw)
x_f = flatten_stack(x_raw)

# Apply scaling and transformation
x_scaled,x_mean, x_std = scale(x_f)
scale1 = (x_f[0][0] -np.mean(x_f))/np.std(x_f)

# Finally, combine the inputs and outputs into a single 2D Matrix.
data_raw = np.hstack((x_scaled, y_f))

拆分数据集：

def train_test_split(dataset, train_ratio, N, data_per_sample):
  '''
  Splits a 2D dataset into training and test sets
  Parameters:
  dataset(numpy.array) - 2D numpy array to split
  train_ratio(np.float) - float value of the training percentage
  N(int) - total number of samples
  data_per_sample - number of data sets 
  '''
  # Apply shuffling a second time
  np.random.shuffle(dataset)

  # Define training data ratio
  train_index = int(train_ratio*N)

  # Round train_index
  train_index = data_per_sample * floor(train_index/data_per_sample)
  # Get training and test sets
  training_set, test_set = dataset[0:train_index], dataset[train_index:]
  return training_set, test_set

# Split dataset to training and test sets
train_set, test_set = train_test_split(data_raw, 0.8, data_raw.shape[0], num_data)
np.random.shuffle(train_set)
np.random.shuffle(test_set)
# Training Set
x_training = train_set[:,0].reshape(-1,1)
y_training = train_set[:,1].reshape(-1,1)
# Test Set
x_testing = test_set[:,0].reshape(-1,1)
y_testing = test_set[:,1].reshape(-1,1)

神经网络类

class NeuralNetwork:
  def __init__(self, architecture):
    '''
    Parameters:

    architecture - array containing the number of neurons per layer.
    '''

    # Initialize the network architecture
    self.L = architecture.size - 1 # L defines the last layer of the network.
    self.n = architecture

    # Create a dictionary to store the weights and biases
    self.parameters = {}
    
    # Initialize network parameters
    for i in range (1, self.L + 1): 
        # Initialize weights from the standard normal distribution
        self.parameters[f'W{i}'] = np.random.randn(self.n[i], self.n[i - 1])
        # Initialize rest of the parameters to 1
        self.parameters[f'b{i}'] = np.ones((self.n[i], 1))
        self.parameters[f'z{i}'] = np.ones((self.n[i], 1))
        self.parameters[f'a{i}'] = np.ones((self.n[i], 1))
    
    # Initialize the first activated values a[0]
    self.parameters['a0'] = np.ones((self.n[i], 1))
    
    # Initialize the cost:
    self.parameters['C'] = 0
    
    # Create a dictionary for storing the derivatives:
    self.derivatives = {}

  def forward_propagate(self, X):
    # Note that X here, is just one training example
    self.parameters['a0'] = X
    
    # Calculate the activations for every hidden layer    
    for l in range(1, self.L + 1):
      self.parameters[f'z{l}'] = np.dot(self.parameters[f'W{l}'], self.parameters[f'a{l - 1}']) + self.parameters[f'b{l}']
      if l == self.L:
        self.parameters[f'a{l}'] = sigmoid(self.parameters[f'z{l}'])
      else:
        self.parameters[f'a{l}'] = relu(self.parameters[f'z{l}'])
      
  def compute_cost(self, y):
    self.parameters['C'] = 0.5*(self.parameters[f'a{self.L}'] - y)**2
  def compute_derivatives(self, y):
    # Partial derivatives of the cost function with respect to z[L], W[L] and b[L]:        
    # dzL
    self.derivatives[f'dz{self.L}'] = (self.parameters[f'a{self.L}'] - y) * sigmoid_prime(self.parameters[f'z{self.L}'])
    # dWL
    self.derivatives[f'dW{self.L}'] = np.dot(self.derivatives[f'dz{self.L}'], np.transpose(self.parameters[f'a{self.L - 1}']))
    # dbL
    self.derivatives[f'db{self.L}'] = self.derivatives[f'dz{self.L}']

    # Implementing the above in a loop:
    for l in range(self.L-1, 0, -1):
      self.derivatives[f'dz{l}'] = np.dot(np.transpose(self.parameters[f'W{l + 1}']), self.derivatives[f'dz{l + 1}'])*relu_prime(self.parameters[f'z{l}'])
      self.derivatives[f'dW{l}'] = np.dot(self.derivatives[f'dz{l}'], np.transpose(self.parameters[f'a{l - 1}']))
      self.derivatives[f'db{l}'] = self.derivatives[f'dz{l}']

  def update_parameters(self, alpha):
    for l in range(1, self.L+1):
      self.parameters[f'W{l}'] -= alpha*self.derivatives[f'dW{l}']
      self.parameters[f'b{l}'] -= alpha*self.derivatives[f'db{l}']
    
  def predict(self, x):
    self.forward_propagate(x)
    return self.parameters[f'a{self.L}']
      
  def fit(self, X, Y, num_iter, alpha = 0.1):
    for iter in range(0, num_iter):
      c = 0 # Stores the cost
      n_c = 0 # Stores the number of correct predictions

      for i in range(0, X.shape[0]):
        x = X[i].reshape((X[i].size, 1))
        y = Y[i]

        self.forward_propagate(x)
        self.compute_cost(y)
        self.compute_derivatives(y)
        self.update_parameters(alpha)
        c += self.parameters['C'] 
        y_pred = self.predict(x).round(4)
        y_flat = y_pred.flatten()
        if y_flat[0] == y:
            n_c += 1
      
      c = c/X.shape[0]
      if (iter % 10 == 0):
        print(f"Iteration: {iter} Cost: {c} Accuracy: {(c/X.shape[0])*100}")

训练模型

# Defining the model architecture
architecture = np.array([1, 64, 64, 1])

# Creating the classifier
classifier = NeuralNetwork(architecture)

# #Training the classifier
classifier.fit(x_training, y_training, 150, alpha=0.1)

训练的截断输出：

Iteration: 0 Cost: [[0.00193444]] Accuracy: [[0.00483609]]
Iteration: 1 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 2 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 3 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 4 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 5 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 6 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 7 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 8 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 9 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 10 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 11 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 12 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 13 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 14 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 15 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 16 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 17 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 18 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 19 Cost: [[0.00025961]] Accuracy: [[0.00064902]]

我尝试过的其他事情是将学习率（alpha）降低到 0.001/0.0001 左右，但仍然得到相同的结果。

Answer 1

我看到的第一个问题是训练集和测试集的构建。 您已经移除了可能导致正态分布的一半位于训练集中而另一半位于测试集中的混洗。 您希望在两个集合中的整个分布中混合点，因此我认为在拆分之前应该对数据集进行混洗。

其次，您报告的训练集准确性似乎并不准确。 它更像是损失，所以也许网络训练得非常快（在最初的几次迭代中）。

最后，这更像是一个回归问题，而不是分类问题。 因此，不应通过预测和真实情况的精确比较来评估测试数据的准确性。 相反，您应该使用一些度量来计算预测与基本事实和目标之间的距离，以通过训练 NN 来减少距离。

从零开始的神经网络 X = P(X)

问题描述

目标

架构

数据集生成

导入库：

一些辅助函数：

数据集生成

拆分数据集：

神经网络类

训练模型

1 个解决方案

解决方案1
1 2020-11-18 01:45:07

从零开始的神经网络 X = P(X)

问题描述

目标

架构

数据集生成

导入库：

一些辅助函数：

数据集生成

拆分数据集：

神经网络类

训练模型

1 个解决方案

解决方案1 1 2020-11-18 01:45:07

解决方案1
1 2020-11-18 01:45:07