Neural Network from scratch X = P(X)

Question

Goal

For a course requirement I need to create a NN to predict the probability of normal random variables within (-2 Std, 2Std) from the mean.

The Architecture

The architecture I am required to implement is composed of 2 hidden layers with ReLu and the final layer with sigmoid.

1 unit - Input Layer

64 units - Hidden layer(ReLu)

1 unit - Output Layer(Sigmoid)

However, I can't seem to get good results. Specifically, the calculated derivatives are too small that my weights/biases aren't changing. I don't know if I am generating my dataset wrong or my implementation of backpropagation is wrong. I am near my wits end as I am trying to solve this for the past two weeks. This is my first dive into machine learning and I am hoping for some guidance.

Here is what I came up with: Link to full notebook

Below is the logic/psuedo code of the dataset generation.

Dataset generation

First I generate normal random variables and generate a histogram for the probability calculation.
Using the bin centers and their respective frequencies as the starting point, I generated more data within each bin interval. Since at this point I thought that having only the bin centers as input would be too small of a sample size. I then calculated the output (P(X)) using the bin centers probability (frequency/n_samples).

EDIT: Added the code.

Import libraries:

import numpy as np
import matplotlib.pyplot as plt
from math import floor

Some helper functions:

def scale(X):
  '''
  Normalizes X values for easier processing
  
  Parameters:
  X(numpy.array) - Raw input data to be normalized

  Returns:
  scaled_x(numpy.array) - Normalized input values
  mean_x(float) - mean of raw input data
  std_x(float) - standard deviation of raw input data
  '''
  mean_x = np.mean(X, axis=0)
  new = X - mean_x
  std_x = np.std(X, axis=0)
  scaled_x = new / std_x 
  return scaled_x, mean_x, std_x
def filter(input, output, n=10):
  '''
  Randomly selects n samples from the first row with a matching
  key from the second column.

  Parameters:
  input(numpy.array) - Input matrix to select samples
  output(int/float) - Value to match in the second column
  n(int, default = 10) - Number of samples to select

  Returns:
  new_x(numpy.array) - Row vector with size (n,)
  '''
  # Select all rows with the second column matching the output
  rows_with_output = input[(input[:, 1] == output)]
  # Get only the first column (X values)
  input_vals = rows_with_output[:,0]
  # Randomly pick 10 samples
  generator = np.random.default_rng(42)
  new_x = generator.choice(input_vals, n)
  return new_x

Dataset Generation

def random_range(lb, ub, n):
  '''Function to generate 'n' uniform random numbers in the interval (lb, ub]
  Parameters:
  lb - lower bound
  ub - upper bound
  n - number of random numbers to generate

  Returns:
  random_nums(numpy.array) - array of random numbers
  '''
  generator = np.random.default_rng(42)
  random_nums = generator.uniform(lb, ub, n)
  return random_nums

np.random.seed(42)
# Define mean and std
mu, sigma = 100, 15
num_bins = 100
raw_sample_size = 15000
raws = np.random.normal(mu, sigma, raw_sample_size).round(4)
hist, bin_edges = np.histogram(raws, bins=num_bins)
# Half bin width
bin_half_val = np.diff(bin_edges)/2
# Get the bin centers
bin_centers = bin_edges[:-1] + np.diff(bin_edges)/2
# Plot the 'discretized' distribution
plt.scatter(bin_centers, hist)

# Define bounds
lb = mu - 2*sigma
ub = mu + 2*sigma
# Select bin_centers inside the range [lb, ub]
bins = bin_centers[(bin_centers >= lb) & (bin_centers < ub)]
bin_idx = [np.where(bin_centers == bin) for bin in bins]

# Flatten index
bin_idx = np.array(bin_idx).flatten()
# Get frequency values for respective bin centers
hist_in_range = hist[bin_idx]

# Plot
plt.scatter(bins, hist_in_range)

# Define number of samples for each bin
num_data = 10

# For each bin generate 10 random numbers in the interval.
x_raw = np.array([random_range(x-bin_half_val[0], x+bin_half_val[0], num_data).reshape((num_data, 1)) for x in bins])
x_raw = x_raw.reshape(len(bins),1,num_data)

# Generate the probabilities.
y_raw = np.array([np.full((num_data,1), round(f/raw_sample_size, 4)) for f in hist_in_range])

# Flatten the arrays
y_f = flatten_stack(y_raw)
x_f = flatten_stack(x_raw)

# Apply scaling and transformation
x_scaled,x_mean, x_std = scale(x_f)
scale1 = (x_f[0][0] -np.mean(x_f))/np.std(x_f)

# Finally, combine the inputs and outputs into a single 2D Matrix.
data_raw = np.hstack((x_scaled, y_f))

Splitting the dataset:

def train_test_split(dataset, train_ratio, N, data_per_sample):
  '''
  Splits a 2D dataset into training and test sets
  Parameters:
  dataset(numpy.array) - 2D numpy array to split
  train_ratio(np.float) - float value of the training percentage
  N(int) - total number of samples
  data_per_sample - number of data sets 
  '''
  # Apply shuffling a second time
  np.random.shuffle(dataset)

  # Define training data ratio
  train_index = int(train_ratio*N)

  # Round train_index
  train_index = data_per_sample * floor(train_index/data_per_sample)
  # Get training and test sets
  training_set, test_set = dataset[0:train_index], dataset[train_index:]
  return training_set, test_set

# Split dataset to training and test sets
train_set, test_set = train_test_split(data_raw, 0.8, data_raw.shape[0], num_data)
np.random.shuffle(train_set)
np.random.shuffle(test_set)
# Training Set
x_training = train_set[:,0].reshape(-1,1)
y_training = train_set[:,1].reshape(-1,1)
# Test Set
x_testing = test_set[:,0].reshape(-1,1)
y_testing = test_set[:,1].reshape(-1,1)

Neural Network Class

class NeuralNetwork:
  def __init__(self, architecture):
    '''
    Parameters:

    architecture - array containing the number of neurons per layer.
    '''

    # Initialize the network architecture
    self.L = architecture.size - 1 # L defines the last layer of the network.
    self.n = architecture

    # Create a dictionary to store the weights and biases
    self.parameters = {}
    
    # Initialize network parameters
    for i in range (1, self.L + 1): 
        # Initialize weights from the standard normal distribution
        self.parameters[f'W{i}'] = np.random.randn(self.n[i], self.n[i - 1])
        # Initialize rest of the parameters to 1
        self.parameters[f'b{i}'] = np.ones((self.n[i], 1))
        self.parameters[f'z{i}'] = np.ones((self.n[i], 1))
        self.parameters[f'a{i}'] = np.ones((self.n[i], 1))
    
    # Initialize the first activated values a[0]
    self.parameters['a0'] = np.ones((self.n[i], 1))
    
    # Initialize the cost:
    self.parameters['C'] = 0
    
    # Create a dictionary for storing the derivatives:
    self.derivatives = {}

  def forward_propagate(self, X):
    # Note that X here, is just one training example
    self.parameters['a0'] = X
    
    # Calculate the activations for every hidden layer    
    for l in range(1, self.L + 1):
      self.parameters[f'z{l}'] = np.dot(self.parameters[f'W{l}'], self.parameters[f'a{l - 1}']) + self.parameters[f'b{l}']
      if l == self.L:
        self.parameters[f'a{l}'] = sigmoid(self.parameters[f'z{l}'])
      else:
        self.parameters[f'a{l}'] = relu(self.parameters[f'z{l}'])
      
  def compute_cost(self, y):
    self.parameters['C'] = 0.5*(self.parameters[f'a{self.L}'] - y)**2
  def compute_derivatives(self, y):
    # Partial derivatives of the cost function with respect to z[L], W[L] and b[L]:        
    # dzL
    self.derivatives[f'dz{self.L}'] = (self.parameters[f'a{self.L}'] - y) * sigmoid_prime(self.parameters[f'z{self.L}'])
    # dWL
    self.derivatives[f'dW{self.L}'] = np.dot(self.derivatives[f'dz{self.L}'], np.transpose(self.parameters[f'a{self.L - 1}']))
    # dbL
    self.derivatives[f'db{self.L}'] = self.derivatives[f'dz{self.L}']

    # Implementing the above in a loop:
    for l in range(self.L-1, 0, -1):
      self.derivatives[f'dz{l}'] = np.dot(np.transpose(self.parameters[f'W{l + 1}']), self.derivatives[f'dz{l + 1}'])*relu_prime(self.parameters[f'z{l}'])
      self.derivatives[f'dW{l}'] = np.dot(self.derivatives[f'dz{l}'], np.transpose(self.parameters[f'a{l - 1}']))
      self.derivatives[f'db{l}'] = self.derivatives[f'dz{l}']

  def update_parameters(self, alpha):
    for l in range(1, self.L+1):
      self.parameters[f'W{l}'] -= alpha*self.derivatives[f'dW{l}']
      self.parameters[f'b{l}'] -= alpha*self.derivatives[f'db{l}']
    
  def predict(self, x):
    self.forward_propagate(x)
    return self.parameters[f'a{self.L}']
      
  def fit(self, X, Y, num_iter, alpha = 0.1):
    for iter in range(0, num_iter):
      c = 0 # Stores the cost
      n_c = 0 # Stores the number of correct predictions

      for i in range(0, X.shape[0]):
        x = X[i].reshape((X[i].size, 1))
        y = Y[i]

        self.forward_propagate(x)
        self.compute_cost(y)
        self.compute_derivatives(y)
        self.update_parameters(alpha)
        c += self.parameters['C'] 
        y_pred = self.predict(x).round(4)
        y_flat = y_pred.flatten()
        if y_flat[0] == y:
            n_c += 1
      
      c = c/X.shape[0]
      if (iter % 10 == 0):
        print(f"Iteration: {iter} Cost: {c} Accuracy: {(c/X.shape[0])*100}")

Training the model

# Defining the model architecture
architecture = np.array([1, 64, 64, 1])

# Creating the classifier
classifier = NeuralNetwork(architecture)

# #Training the classifier
classifier.fit(x_training, y_training, 150, alpha=0.1)

Truncated output of the training:

Iteration: 0 Cost: [[0.00193444]] Accuracy: [[0.00483609]]
Iteration: 1 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 2 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 3 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 4 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 5 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 6 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 7 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 8 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 9 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 10 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 11 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 12 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 13 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 14 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 15 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 16 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 17 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 18 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 19 Cost: [[0.00025961]] Accuracy: [[0.00064902]]

Other things I tried are decreasing the learning rate(alpha) to around 0.001/0.0001 but still get the same results.

Answer 1

First issue I see is the construction of training and test sets. You have removed shuffling which may cause one half of the normal distribution to lie in the training set and other half in the test set. You want a mix of points across the entire distribution in both sets, so I think datasets should be shuffled before splitting.

Secondly, what you are reporting as accuracy for training set doesn't seem to be accuracy. It is more like the loss, so perhaps the network trained quite fast (within first few iterations).

Lastly, this is more like a regression problem, not classification. So accuracy over test data shouldn't be evaluated by EXACT COMPARISON of prediction and ground truth. Instead, you should use some metric to compute distance between prediction and ground truth and target to reduce the distance by training the NN.

Neural Network from scratch X = P(X)

Question

Goal

The Architecture

Dataset generation

Import libraries:

Some helper functions:

Dataset Generation

Splitting the dataset:

Neural Network Class

Training the model

1 answers

solution1
1 2020-11-18 01:45:07

Neural Network from scratch X = P(X)

Question

Goal

The Architecture

Dataset generation

Import libraries:

Some helper functions:

Dataset Generation

Splitting the dataset:

Neural Network Class

Training the model

1 answers

solution1 1 2020-11-18 01:45:07

solution1
1 2020-11-18 01:45:07