For a course requirement I need to create a NN to predict the probability of normal random variables within (-2 Std, 2Std) from the mean.
The architecture I am required to implement is composed of 2 hidden layers with ReLu and the final layer with sigmoid.
1 unit - Input Layer
64 units - Hidden layer(ReLu)
64 units - Hidden layer(ReLu)
1 unit - Output Layer(Sigmoid)
However, I can't seem to get good results. Specifically, the calculated derivatives are too small that my weights/biases aren't changing. I don't know if I am generating my dataset wrong or my implementation of backpropagation is wrong. I am near my wits end as I am trying to solve this for the past two weeks. This is my first dive into machine learning and I am hoping for some guidance.
Here is what I came up with: Link to full notebook
Below is the logic/psuedo code of the dataset generation.
EDIT: Added the code.
import numpy as np
import matplotlib.pyplot as plt
from math import floor
def scale(X):
'''
Normalizes X values for easier processing
Parameters:
X(numpy.array) - Raw input data to be normalized
Returns:
scaled_x(numpy.array) - Normalized input values
mean_x(float) - mean of raw input data
std_x(float) - standard deviation of raw input data
'''
mean_x = np.mean(X, axis=0)
new = X - mean_x
std_x = np.std(X, axis=0)
scaled_x = new / std_x
return scaled_x, mean_x, std_x
def filter(input, output, n=10):
'''
Randomly selects n samples from the first row with a matching
key from the second column.
Parameters:
input(numpy.array) - Input matrix to select samples
output(int/float) - Value to match in the second column
n(int, default = 10) - Number of samples to select
Returns:
new_x(numpy.array) - Row vector with size (n,)
'''
# Select all rows with the second column matching the output
rows_with_output = input[(input[:, 1] == output)]
# Get only the first column (X values)
input_vals = rows_with_output[:,0]
# Randomly pick 10 samples
generator = np.random.default_rng(42)
new_x = generator.choice(input_vals, n)
return new_x
def random_range(lb, ub, n):
'''Function to generate 'n' uniform random numbers in the interval (lb, ub]
Parameters:
lb - lower bound
ub - upper bound
n - number of random numbers to generate
Returns:
random_nums(numpy.array) - array of random numbers
'''
generator = np.random.default_rng(42)
random_nums = generator.uniform(lb, ub, n)
return random_nums
np.random.seed(42)
# Define mean and std
mu, sigma = 100, 15
num_bins = 100
raw_sample_size = 15000
raws = np.random.normal(mu, sigma, raw_sample_size).round(4)
hist, bin_edges = np.histogram(raws, bins=num_bins)
# Half bin width
bin_half_val = np.diff(bin_edges)/2
# Get the bin centers
bin_centers = bin_edges[:-1] + np.diff(bin_edges)/2
# Plot the 'discretized' distribution
plt.scatter(bin_centers, hist)
# Define bounds
lb = mu - 2*sigma
ub = mu + 2*sigma
# Select bin_centers inside the range [lb, ub]
bins = bin_centers[(bin_centers >= lb) & (bin_centers < ub)]
bin_idx = [np.where(bin_centers == bin) for bin in bins]
# Flatten index
bin_idx = np.array(bin_idx).flatten()
# Get frequency values for respective bin centers
hist_in_range = hist[bin_idx]
# Plot
plt.scatter(bins, hist_in_range)
# Define number of samples for each bin
num_data = 10
# For each bin generate 10 random numbers in the interval.
x_raw = np.array([random_range(x-bin_half_val[0], x+bin_half_val[0], num_data).reshape((num_data, 1)) for x in bins])
x_raw = x_raw.reshape(len(bins),1,num_data)
# Generate the probabilities.
y_raw = np.array([np.full((num_data,1), round(f/raw_sample_size, 4)) for f in hist_in_range])
# Flatten the arrays
y_f = flatten_stack(y_raw)
x_f = flatten_stack(x_raw)
# Apply scaling and transformation
x_scaled,x_mean, x_std = scale(x_f)
scale1 = (x_f[0][0] -np.mean(x_f))/np.std(x_f)
# Finally, combine the inputs and outputs into a single 2D Matrix.
data_raw = np.hstack((x_scaled, y_f))
def train_test_split(dataset, train_ratio, N, data_per_sample):
'''
Splits a 2D dataset into training and test sets
Parameters:
dataset(numpy.array) - 2D numpy array to split
train_ratio(np.float) - float value of the training percentage
N(int) - total number of samples
data_per_sample - number of data sets
'''
# Apply shuffling a second time
np.random.shuffle(dataset)
# Define training data ratio
train_index = int(train_ratio*N)
# Round train_index
train_index = data_per_sample * floor(train_index/data_per_sample)
# Get training and test sets
training_set, test_set = dataset[0:train_index], dataset[train_index:]
return training_set, test_set
# Split dataset to training and test sets
train_set, test_set = train_test_split(data_raw, 0.8, data_raw.shape[0], num_data)
np.random.shuffle(train_set)
np.random.shuffle(test_set)
# Training Set
x_training = train_set[:,0].reshape(-1,1)
y_training = train_set[:,1].reshape(-1,1)
# Test Set
x_testing = test_set[:,0].reshape(-1,1)
y_testing = test_set[:,1].reshape(-1,1)
class NeuralNetwork:
def __init__(self, architecture):
'''
Parameters:
architecture - array containing the number of neurons per layer.
'''
# Initialize the network architecture
self.L = architecture.size - 1 # L defines the last layer of the network.
self.n = architecture
# Create a dictionary to store the weights and biases
self.parameters = {}
# Initialize network parameters
for i in range (1, self.L + 1):
# Initialize weights from the standard normal distribution
self.parameters[f'W{i}'] = np.random.randn(self.n[i], self.n[i - 1])
# Initialize rest of the parameters to 1
self.parameters[f'b{i}'] = np.ones((self.n[i], 1))
self.parameters[f'z{i}'] = np.ones((self.n[i], 1))
self.parameters[f'a{i}'] = np.ones((self.n[i], 1))
# Initialize the first activated values a[0]
self.parameters['a0'] = np.ones((self.n[i], 1))
# Initialize the cost:
self.parameters['C'] = 0
# Create a dictionary for storing the derivatives:
self.derivatives = {}
def forward_propagate(self, X):
# Note that X here, is just one training example
self.parameters['a0'] = X
# Calculate the activations for every hidden layer
for l in range(1, self.L + 1):
self.parameters[f'z{l}'] = np.dot(self.parameters[f'W{l}'], self.parameters[f'a{l - 1}']) + self.parameters[f'b{l}']
if l == self.L:
self.parameters[f'a{l}'] = sigmoid(self.parameters[f'z{l}'])
else:
self.parameters[f'a{l}'] = relu(self.parameters[f'z{l}'])
def compute_cost(self, y):
self.parameters['C'] = 0.5*(self.parameters[f'a{self.L}'] - y)**2
def compute_derivatives(self, y):
# Partial derivatives of the cost function with respect to z[L], W[L] and b[L]:
# dzL
self.derivatives[f'dz{self.L}'] = (self.parameters[f'a{self.L}'] - y) * sigmoid_prime(self.parameters[f'z{self.L}'])
# dWL
self.derivatives[f'dW{self.L}'] = np.dot(self.derivatives[f'dz{self.L}'], np.transpose(self.parameters[f'a{self.L - 1}']))
# dbL
self.derivatives[f'db{self.L}'] = self.derivatives[f'dz{self.L}']
# Implementing the above in a loop:
for l in range(self.L-1, 0, -1):
self.derivatives[f'dz{l}'] = np.dot(np.transpose(self.parameters[f'W{l + 1}']), self.derivatives[f'dz{l + 1}'])*relu_prime(self.parameters[f'z{l}'])
self.derivatives[f'dW{l}'] = np.dot(self.derivatives[f'dz{l}'], np.transpose(self.parameters[f'a{l - 1}']))
self.derivatives[f'db{l}'] = self.derivatives[f'dz{l}']
def update_parameters(self, alpha):
for l in range(1, self.L+1):
self.parameters[f'W{l}'] -= alpha*self.derivatives[f'dW{l}']
self.parameters[f'b{l}'] -= alpha*self.derivatives[f'db{l}']
def predict(self, x):
self.forward_propagate(x)
return self.parameters[f'a{self.L}']
def fit(self, X, Y, num_iter, alpha = 0.1):
for iter in range(0, num_iter):
c = 0 # Stores the cost
n_c = 0 # Stores the number of correct predictions
for i in range(0, X.shape[0]):
x = X[i].reshape((X[i].size, 1))
y = Y[i]
self.forward_propagate(x)
self.compute_cost(y)
self.compute_derivatives(y)
self.update_parameters(alpha)
c += self.parameters['C']
y_pred = self.predict(x).round(4)
y_flat = y_pred.flatten()
if y_flat[0] == y:
n_c += 1
c = c/X.shape[0]
if (iter % 10 == 0):
print(f"Iteration: {iter} Cost: {c} Accuracy: {(c/X.shape[0])*100}")
# Defining the model architecture
architecture = np.array([1, 64, 64, 1])
# Creating the classifier
classifier = NeuralNetwork(architecture)
# #Training the classifier
classifier.fit(x_training, y_training, 150, alpha=0.1)
Truncated output of the training:
Iteration: 0 Cost: [[0.00193444]] Accuracy: [[0.00483609]]
Iteration: 1 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 2 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 3 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 4 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 5 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 6 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 7 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 8 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 9 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 10 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 11 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 12 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 13 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 14 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 15 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 16 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 17 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 18 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 19 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Other things I tried are decreasing the learning rate(alpha) to around 0.001/0.0001 but still get the same results.
First issue I see is the construction of training and test sets. You have removed shuffling which may cause one half of the normal distribution to lie in the training set and other half in the test set. You want a mix of points across the entire distribution in both sets, so I think datasets should be shuffled before splitting.
Secondly, what you are reporting as accuracy for training set doesn't seem to be accuracy. It is more like the loss, so perhaps the network trained quite fast (within first few iterations).
Lastly, this is more like a regression problem, not classification. So accuracy over test data shouldn't be evaluated by EXACT COMPARISON of prediction and ground truth. Instead, you should use some metric to compute distance between prediction and ground truth and target to reduce the distance by training the NN.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.