从零开始的神经网络 X = P(X)

Question

Goal目标

For a course requirement I need to create a NN to predict the probability of normal random variables within (-2 Std, 2Std) from the mean.对于课程要求，我需要创建一个 NN 来从平均值预测 (-2 Std, 2Std) 内正常随机变量的概率。

The Architecture架构

The architecture I am required to implement is composed of 2 hidden layers with ReLu and the final layer with sigmoid.我需要实现的架构由 2 个带有 ReLu 的隐藏层和带有 sigmoid 的最后一层组成。

1 unit - Input Layer 1 个单元 - 输入层

64 units - Hidden layer(ReLu) 64 个单元 - 隐藏层（ReLu）

1 unit - Output Layer(Sigmoid) 1 个单元 - 输出层（Sigmoid）

However, I can't seem to get good results.但是，我似乎无法得到好的结果。 Specifically, the calculated derivatives are too small that my weights/biases aren't changing.具体来说，计算的导数太小，我的权重/偏差没有改变。 I don't know if I am generating my dataset wrong or my implementation of backpropagation is wrong.我不知道是我生成的数据集错误还是我的反向传播实现错误。 I am near my wits end as I am trying to solve this for the past two weeks.在过去的两周里，我正试图解决这个问题，所以我的智慧已接近尾声。 This is my first dive into machine learning and I am hoping for some guidance.这是我第一次深入机器学习，我希望得到一些指导。

Here is what I came up with: Link to full notebook这是我想出的：链接到完整的笔记本

Below is the logic/psuedo code of the dataset generation.下面是数据集生成的逻辑/伪代码。

Dataset generation数据集生成

First I generate normal random variables and generate a histogram for the probability calculation.首先，我生成正态随机变量并生成用于概率计算的直方图。
Using the bin centers and their respective frequencies as the starting point, I generated more data within each bin interval.使用 bin 中心及其各自的频率作为起点，我在每个 bin 间隔内生成了更多数据。 Since at this point I thought that having only the bin centers as input would be too small of a sample size.因为此时我认为只有 bin 中心作为输入对于样本量来说太小了。 I then calculated the output (P(X)) using the bin centers probability (frequency/n_samples).然后我使用 bin 中心概率（频率/n_samples）计算输出（P（X））。

EDIT: Added the code.编辑：添加了代码。

Import libraries:导入库：

import numpy as np
import matplotlib.pyplot as plt
from math import floor

Some helper functions:一些辅助函数：

def scale(X):
  '''
  Normalizes X values for easier processing
  
  Parameters:
  X(numpy.array) - Raw input data to be normalized

  Returns:
  scaled_x(numpy.array) - Normalized input values
  mean_x(float) - mean of raw input data
  std_x(float) - standard deviation of raw input data
  '''
  mean_x = np.mean(X, axis=0)
  new = X - mean_x
  std_x = np.std(X, axis=0)
  scaled_x = new / std_x 
  return scaled_x, mean_x, std_x
def filter(input, output, n=10):
  '''
  Randomly selects n samples from the first row with a matching
  key from the second column.

  Parameters:
  input(numpy.array) - Input matrix to select samples
  output(int/float) - Value to match in the second column
  n(int, default = 10) - Number of samples to select

  Returns:
  new_x(numpy.array) - Row vector with size (n,)
  '''
  # Select all rows with the second column matching the output
  rows_with_output = input[(input[:, 1] == output)]
  # Get only the first column (X values)
  input_vals = rows_with_output[:,0]
  # Randomly pick 10 samples
  generator = np.random.default_rng(42)
  new_x = generator.choice(input_vals, n)
  return new_x

Dataset Generation数据集生成

def random_range(lb, ub, n):
  '''Function to generate 'n' uniform random numbers in the interval (lb, ub]
  Parameters:
  lb - lower bound
  ub - upper bound
  n - number of random numbers to generate

  Returns:
  random_nums(numpy.array) - array of random numbers
  '''
  generator = np.random.default_rng(42)
  random_nums = generator.uniform(lb, ub, n)
  return random_nums

np.random.seed(42)
# Define mean and std
mu, sigma = 100, 15
num_bins = 100
raw_sample_size = 15000
raws = np.random.normal(mu, sigma, raw_sample_size).round(4)
hist, bin_edges = np.histogram(raws, bins=num_bins)
# Half bin width
bin_half_val = np.diff(bin_edges)/2
# Get the bin centers
bin_centers = bin_edges[:-1] + np.diff(bin_edges)/2
# Plot the 'discretized' distribution
plt.scatter(bin_centers, hist)

# Define bounds
lb = mu - 2*sigma
ub = mu + 2*sigma
# Select bin_centers inside the range [lb, ub]
bins = bin_centers[(bin_centers >= lb) & (bin_centers < ub)]
bin_idx = [np.where(bin_centers == bin) for bin in bins]

# Flatten index
bin_idx = np.array(bin_idx).flatten()
# Get frequency values for respective bin centers
hist_in_range = hist[bin_idx]

# Plot
plt.scatter(bins, hist_in_range)

# Define number of samples for each bin
num_data = 10

# For each bin generate 10 random numbers in the interval.
x_raw = np.array([random_range(x-bin_half_val[0], x+bin_half_val[0], num_data).reshape((num_data, 1)) for x in bins])
x_raw = x_raw.reshape(len(bins),1,num_data)

# Generate the probabilities.
y_raw = np.array([np.full((num_data,1), round(f/raw_sample_size, 4)) for f in hist_in_range])

# Flatten the arrays
y_f = flatten_stack(y_raw)
x_f = flatten_stack(x_raw)

# Apply scaling and transformation
x_scaled,x_mean, x_std = scale(x_f)
scale1 = (x_f[0][0] -np.mean(x_f))/np.std(x_f)

# Finally, combine the inputs and outputs into a single 2D Matrix.
data_raw = np.hstack((x_scaled, y_f))

Splitting the dataset:拆分数据集：

def train_test_split(dataset, train_ratio, N, data_per_sample):
  '''
  Splits a 2D dataset into training and test sets
  Parameters:
  dataset(numpy.array) - 2D numpy array to split
  train_ratio(np.float) - float value of the training percentage
  N(int) - total number of samples
  data_per_sample - number of data sets 
  '''
  # Apply shuffling a second time
  np.random.shuffle(dataset)

  # Define training data ratio
  train_index = int(train_ratio*N)

  # Round train_index
  train_index = data_per_sample * floor(train_index/data_per_sample)
  # Get training and test sets
  training_set, test_set = dataset[0:train_index], dataset[train_index:]
  return training_set, test_set

# Split dataset to training and test sets
train_set, test_set = train_test_split(data_raw, 0.8, data_raw.shape[0], num_data)
np.random.shuffle(train_set)
np.random.shuffle(test_set)
# Training Set
x_training = train_set[:,0].reshape(-1,1)
y_training = train_set[:,1].reshape(-1,1)
# Test Set
x_testing = test_set[:,0].reshape(-1,1)
y_testing = test_set[:,1].reshape(-1,1)

Neural Network Class神经网络类

class NeuralNetwork:
  def __init__(self, architecture):
    '''
    Parameters:

    architecture - array containing the number of neurons per layer.
    '''

    # Initialize the network architecture
    self.L = architecture.size - 1 # L defines the last layer of the network.
    self.n = architecture

    # Create a dictionary to store the weights and biases
    self.parameters = {}
    
    # Initialize network parameters
    for i in range (1, self.L + 1): 
        # Initialize weights from the standard normal distribution
        self.parameters[f'W{i}'] = np.random.randn(self.n[i], self.n[i - 1])
        # Initialize rest of the parameters to 1
        self.parameters[f'b{i}'] = np.ones((self.n[i], 1))
        self.parameters[f'z{i}'] = np.ones((self.n[i], 1))
        self.parameters[f'a{i}'] = np.ones((self.n[i], 1))
    
    # Initialize the first activated values a[0]
    self.parameters['a0'] = np.ones((self.n[i], 1))
    
    # Initialize the cost:
    self.parameters['C'] = 0
    
    # Create a dictionary for storing the derivatives:
    self.derivatives = {}

  def forward_propagate(self, X):
    # Note that X here, is just one training example
    self.parameters['a0'] = X
    
    # Calculate the activations for every hidden layer    
    for l in range(1, self.L + 1):
      self.parameters[f'z{l}'] = np.dot(self.parameters[f'W{l}'], self.parameters[f'a{l - 1}']) + self.parameters[f'b{l}']
      if l == self.L:
        self.parameters[f'a{l}'] = sigmoid(self.parameters[f'z{l}'])
      else:
        self.parameters[f'a{l}'] = relu(self.parameters[f'z{l}'])
      
  def compute_cost(self, y):
    self.parameters['C'] = 0.5*(self.parameters[f'a{self.L}'] - y)**2
  def compute_derivatives(self, y):
    # Partial derivatives of the cost function with respect to z[L], W[L] and b[L]:        
    # dzL
    self.derivatives[f'dz{self.L}'] = (self.parameters[f'a{self.L}'] - y) * sigmoid_prime(self.parameters[f'z{self.L}'])
    # dWL
    self.derivatives[f'dW{self.L}'] = np.dot(self.derivatives[f'dz{self.L}'], np.transpose(self.parameters[f'a{self.L - 1}']))
    # dbL
    self.derivatives[f'db{self.L}'] = self.derivatives[f'dz{self.L}']

    # Implementing the above in a loop:
    for l in range(self.L-1, 0, -1):
      self.derivatives[f'dz{l}'] = np.dot(np.transpose(self.parameters[f'W{l + 1}']), self.derivatives[f'dz{l + 1}'])*relu_prime(self.parameters[f'z{l}'])
      self.derivatives[f'dW{l}'] = np.dot(self.derivatives[f'dz{l}'], np.transpose(self.parameters[f'a{l - 1}']))
      self.derivatives[f'db{l}'] = self.derivatives[f'dz{l}']

  def update_parameters(self, alpha):
    for l in range(1, self.L+1):
      self.parameters[f'W{l}'] -= alpha*self.derivatives[f'dW{l}']
      self.parameters[f'b{l}'] -= alpha*self.derivatives[f'db{l}']
    
  def predict(self, x):
    self.forward_propagate(x)
    return self.parameters[f'a{self.L}']
      
  def fit(self, X, Y, num_iter, alpha = 0.1):
    for iter in range(0, num_iter):
      c = 0 # Stores the cost
      n_c = 0 # Stores the number of correct predictions

      for i in range(0, X.shape[0]):
        x = X[i].reshape((X[i].size, 1))
        y = Y[i]

        self.forward_propagate(x)
        self.compute_cost(y)
        self.compute_derivatives(y)
        self.update_parameters(alpha)
        c += self.parameters['C'] 
        y_pred = self.predict(x).round(4)
        y_flat = y_pred.flatten()
        if y_flat[0] == y:
            n_c += 1
      
      c = c/X.shape[0]
      if (iter % 10 == 0):
        print(f"Iteration: {iter} Cost: {c} Accuracy: {(c/X.shape[0])*100}")

Training the model训练模型

# Defining the model architecture
architecture = np.array([1, 64, 64, 1])

# Creating the classifier
classifier = NeuralNetwork(architecture)

# #Training the classifier
classifier.fit(x_training, y_training, 150, alpha=0.1)

Truncated output of the training:训练的截断输出：

Iteration: 0 Cost: [[0.00193444]] Accuracy: [[0.00483609]]
Iteration: 1 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 2 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 3 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 4 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 5 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 6 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 7 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 8 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 9 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 10 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 11 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 12 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 13 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 14 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 15 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 16 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 17 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 18 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 19 Cost: [[0.00025961]] Accuracy: [[0.00064902]]

Other things I tried are decreasing the learning rate(alpha) to around 0.001/0.0001 but still get the same results.我尝试过的其他事情是将学习率（alpha）降低到 0.001/0.0001 左右，但仍然得到相同的结果。

Answer 1

First issue I see is the construction of training and test sets.我看到的第一个问题是训练集和测试集的构建。 You have removed shuffling which may cause one half of the normal distribution to lie in the training set and other half in the test set.您已经移除了可能导致正态分布的一半位于训练集中而另一半位于测试集中的混洗。 You want a mix of points across the entire distribution in both sets, so I think datasets should be shuffled before splitting.您希望在两个集合中的整个分布中混合点，因此我认为在拆分之前应该对数据集进行混洗。

Secondly, what you are reporting as accuracy for training set doesn't seem to be accuracy.其次，您报告的训练集准确性似乎并不准确。 It is more like the loss, so perhaps the network trained quite fast (within first few iterations).它更像是损失，所以也许网络训练得非常快（在最初的几次迭代中）。

Lastly, this is more like a regression problem, not classification.最后，这更像是一个回归问题，而不是分类问题。 So accuracy over test data shouldn't be evaluated by EXACT COMPARISON of prediction and ground truth.因此，不应通过预测和真实情况的精确比较来评估测试数据的准确性。 Instead, you should use some metric to compute distance between prediction and ground truth and target to reduce the distance by training the NN.相反，您应该使用一些度量来计算预测与基本事实和目标之间的距离，以通过训练 NN 来减少距离。

从零开始的神经网络 X = P(X)

问题描述

Goal目标

The Architecture架构

Dataset generation数据集生成

Import libraries:导入库：

Some helper functions:一些辅助函数：

Dataset Generation数据集生成

Splitting the dataset:拆分数据集：

Neural Network Class神经网络类

Training the model训练模型

1 个解决方案

解决方案1
1 2020-11-18 01:45:07

从零开始的神经网络 X = P(X)

问题描述

Goal目标

The Architecture架构

Dataset generation数据集生成

Import libraries:导入库：

Some helper functions:一些辅助函数：

Dataset Generation数据集生成

Splitting the dataset:拆分数据集：

Neural Network Class神经网络类

Training the model训练模型

1 个解决方案

解决方案1 1 2020-11-18 01:45:07

解决方案1
1 2020-11-18 01:45:07