[英]Neural Network from scratch X = P(X)
For a course requirement I need to create a NN to predict the probability of normal random variables within (-2 Std, 2Std) from the mean.对于课程要求,我需要创建一个 NN 来从平均值预测 (-2 Std, 2Std) 内正常随机变量的概率。
The architecture I am required to implement is composed of 2 hidden layers with ReLu and the final layer with sigmoid.我需要实现的架构由 2 个带有 ReLu 的隐藏层和带有 sigmoid 的最后一层组成。
1 unit - Input Layer 1 个单元 - 输入层
64 units - Hidden layer(ReLu) 64 个单元 - 隐藏层(ReLu)
64 units - Hidden layer(ReLu) 64 个单元 - 隐藏层(ReLu)
1 unit - Output Layer(Sigmoid) 1 个单元 - 输出层(Sigmoid)
However, I can't seem to get good results.但是,我似乎无法得到好的结果。 Specifically, the calculated derivatives are too small that my weights/biases aren't changing.具体来说,计算的导数太小,我的权重/偏差没有改变。 I don't know if I am generating my dataset wrong or my implementation of backpropagation is wrong.我不知道是我生成的数据集错误还是我的反向传播实现错误。 I am near my wits end as I am trying to solve this for the past two weeks.在过去的两周里,我正试图解决这个问题,所以我的智慧已接近尾声。 This is my first dive into machine learning and I am hoping for some guidance.这是我第一次深入机器学习,我希望得到一些指导。
Here is what I came up with: Link to full notebook这是我想出的: 链接到完整的笔记本
Below is the logic/psuedo code of the dataset generation.下面是数据集生成的逻辑/伪代码。
EDIT: Added the code.编辑:添加了代码。
import numpy as np
import matplotlib.pyplot as plt
from math import floor
def scale(X):
'''
Normalizes X values for easier processing
Parameters:
X(numpy.array) - Raw input data to be normalized
Returns:
scaled_x(numpy.array) - Normalized input values
mean_x(float) - mean of raw input data
std_x(float) - standard deviation of raw input data
'''
mean_x = np.mean(X, axis=0)
new = X - mean_x
std_x = np.std(X, axis=0)
scaled_x = new / std_x
return scaled_x, mean_x, std_x
def filter(input, output, n=10):
'''
Randomly selects n samples from the first row with a matching
key from the second column.
Parameters:
input(numpy.array) - Input matrix to select samples
output(int/float) - Value to match in the second column
n(int, default = 10) - Number of samples to select
Returns:
new_x(numpy.array) - Row vector with size (n,)
'''
# Select all rows with the second column matching the output
rows_with_output = input[(input[:, 1] == output)]
# Get only the first column (X values)
input_vals = rows_with_output[:,0]
# Randomly pick 10 samples
generator = np.random.default_rng(42)
new_x = generator.choice(input_vals, n)
return new_x
def random_range(lb, ub, n):
'''Function to generate 'n' uniform random numbers in the interval (lb, ub]
Parameters:
lb - lower bound
ub - upper bound
n - number of random numbers to generate
Returns:
random_nums(numpy.array) - array of random numbers
'''
generator = np.random.default_rng(42)
random_nums = generator.uniform(lb, ub, n)
return random_nums
np.random.seed(42)
# Define mean and std
mu, sigma = 100, 15
num_bins = 100
raw_sample_size = 15000
raws = np.random.normal(mu, sigma, raw_sample_size).round(4)
hist, bin_edges = np.histogram(raws, bins=num_bins)
# Half bin width
bin_half_val = np.diff(bin_edges)/2
# Get the bin centers
bin_centers = bin_edges[:-1] + np.diff(bin_edges)/2
# Plot the 'discretized' distribution
plt.scatter(bin_centers, hist)
# Define bounds
lb = mu - 2*sigma
ub = mu + 2*sigma
# Select bin_centers inside the range [lb, ub]
bins = bin_centers[(bin_centers >= lb) & (bin_centers < ub)]
bin_idx = [np.where(bin_centers == bin) for bin in bins]
# Flatten index
bin_idx = np.array(bin_idx).flatten()
# Get frequency values for respective bin centers
hist_in_range = hist[bin_idx]
# Plot
plt.scatter(bins, hist_in_range)
# Define number of samples for each bin
num_data = 10
# For each bin generate 10 random numbers in the interval.
x_raw = np.array([random_range(x-bin_half_val[0], x+bin_half_val[0], num_data).reshape((num_data, 1)) for x in bins])
x_raw = x_raw.reshape(len(bins),1,num_data)
# Generate the probabilities.
y_raw = np.array([np.full((num_data,1), round(f/raw_sample_size, 4)) for f in hist_in_range])
# Flatten the arrays
y_f = flatten_stack(y_raw)
x_f = flatten_stack(x_raw)
# Apply scaling and transformation
x_scaled,x_mean, x_std = scale(x_f)
scale1 = (x_f[0][0] -np.mean(x_f))/np.std(x_f)
# Finally, combine the inputs and outputs into a single 2D Matrix.
data_raw = np.hstack((x_scaled, y_f))
def train_test_split(dataset, train_ratio, N, data_per_sample):
'''
Splits a 2D dataset into training and test sets
Parameters:
dataset(numpy.array) - 2D numpy array to split
train_ratio(np.float) - float value of the training percentage
N(int) - total number of samples
data_per_sample - number of data sets
'''
# Apply shuffling a second time
np.random.shuffle(dataset)
# Define training data ratio
train_index = int(train_ratio*N)
# Round train_index
train_index = data_per_sample * floor(train_index/data_per_sample)
# Get training and test sets
training_set, test_set = dataset[0:train_index], dataset[train_index:]
return training_set, test_set
# Split dataset to training and test sets
train_set, test_set = train_test_split(data_raw, 0.8, data_raw.shape[0], num_data)
np.random.shuffle(train_set)
np.random.shuffle(test_set)
# Training Set
x_training = train_set[:,0].reshape(-1,1)
y_training = train_set[:,1].reshape(-1,1)
# Test Set
x_testing = test_set[:,0].reshape(-1,1)
y_testing = test_set[:,1].reshape(-1,1)
class NeuralNetwork:
def __init__(self, architecture):
'''
Parameters:
architecture - array containing the number of neurons per layer.
'''
# Initialize the network architecture
self.L = architecture.size - 1 # L defines the last layer of the network.
self.n = architecture
# Create a dictionary to store the weights and biases
self.parameters = {}
# Initialize network parameters
for i in range (1, self.L + 1):
# Initialize weights from the standard normal distribution
self.parameters[f'W{i}'] = np.random.randn(self.n[i], self.n[i - 1])
# Initialize rest of the parameters to 1
self.parameters[f'b{i}'] = np.ones((self.n[i], 1))
self.parameters[f'z{i}'] = np.ones((self.n[i], 1))
self.parameters[f'a{i}'] = np.ones((self.n[i], 1))
# Initialize the first activated values a[0]
self.parameters['a0'] = np.ones((self.n[i], 1))
# Initialize the cost:
self.parameters['C'] = 0
# Create a dictionary for storing the derivatives:
self.derivatives = {}
def forward_propagate(self, X):
# Note that X here, is just one training example
self.parameters['a0'] = X
# Calculate the activations for every hidden layer
for l in range(1, self.L + 1):
self.parameters[f'z{l}'] = np.dot(self.parameters[f'W{l}'], self.parameters[f'a{l - 1}']) + self.parameters[f'b{l}']
if l == self.L:
self.parameters[f'a{l}'] = sigmoid(self.parameters[f'z{l}'])
else:
self.parameters[f'a{l}'] = relu(self.parameters[f'z{l}'])
def compute_cost(self, y):
self.parameters['C'] = 0.5*(self.parameters[f'a{self.L}'] - y)**2
def compute_derivatives(self, y):
# Partial derivatives of the cost function with respect to z[L], W[L] and b[L]:
# dzL
self.derivatives[f'dz{self.L}'] = (self.parameters[f'a{self.L}'] - y) * sigmoid_prime(self.parameters[f'z{self.L}'])
# dWL
self.derivatives[f'dW{self.L}'] = np.dot(self.derivatives[f'dz{self.L}'], np.transpose(self.parameters[f'a{self.L - 1}']))
# dbL
self.derivatives[f'db{self.L}'] = self.derivatives[f'dz{self.L}']
# Implementing the above in a loop:
for l in range(self.L-1, 0, -1):
self.derivatives[f'dz{l}'] = np.dot(np.transpose(self.parameters[f'W{l + 1}']), self.derivatives[f'dz{l + 1}'])*relu_prime(self.parameters[f'z{l}'])
self.derivatives[f'dW{l}'] = np.dot(self.derivatives[f'dz{l}'], np.transpose(self.parameters[f'a{l - 1}']))
self.derivatives[f'db{l}'] = self.derivatives[f'dz{l}']
def update_parameters(self, alpha):
for l in range(1, self.L+1):
self.parameters[f'W{l}'] -= alpha*self.derivatives[f'dW{l}']
self.parameters[f'b{l}'] -= alpha*self.derivatives[f'db{l}']
def predict(self, x):
self.forward_propagate(x)
return self.parameters[f'a{self.L}']
def fit(self, X, Y, num_iter, alpha = 0.1):
for iter in range(0, num_iter):
c = 0 # Stores the cost
n_c = 0 # Stores the number of correct predictions
for i in range(0, X.shape[0]):
x = X[i].reshape((X[i].size, 1))
y = Y[i]
self.forward_propagate(x)
self.compute_cost(y)
self.compute_derivatives(y)
self.update_parameters(alpha)
c += self.parameters['C']
y_pred = self.predict(x).round(4)
y_flat = y_pred.flatten()
if y_flat[0] == y:
n_c += 1
c = c/X.shape[0]
if (iter % 10 == 0):
print(f"Iteration: {iter} Cost: {c} Accuracy: {(c/X.shape[0])*100}")
# Defining the model architecture
architecture = np.array([1, 64, 64, 1])
# Creating the classifier
classifier = NeuralNetwork(architecture)
# #Training the classifier
classifier.fit(x_training, y_training, 150, alpha=0.1)
Truncated output of the training:训练的截断输出:
Iteration: 0 Cost: [[0.00193444]] Accuracy: [[0.00483609]]
Iteration: 1 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 2 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 3 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 4 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 5 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 6 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 7 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 8 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 9 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 10 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 11 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 12 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 13 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 14 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 15 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 16 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 17 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 18 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Iteration: 19 Cost: [[0.00025961]] Accuracy: [[0.00064902]]
Other things I tried are decreasing the learning rate(alpha) to around 0.001/0.0001 but still get the same results.我尝试过的其他事情是将学习率(alpha)降低到 0.001/0.0001 左右,但仍然得到相同的结果。
First issue I see is the construction of training and test sets.我看到的第一个问题是训练集和测试集的构建。 You have removed shuffling which may cause one half of the normal distribution to lie in the training set and other half in the test set.您已经移除了可能导致正态分布的一半位于训练集中而另一半位于测试集中的混洗。 You want a mix of points across the entire distribution in both sets, so I think datasets should be shuffled before splitting.您希望在两个集合中的整个分布中混合点,因此我认为在拆分之前应该对数据集进行混洗。
Secondly, what you are reporting as accuracy for training set doesn't seem to be accuracy.其次,您报告的训练集准确性似乎并不准确。 It is more like the loss, so perhaps the network trained quite fast (within first few iterations).它更像是损失,所以也许网络训练得非常快(在最初的几次迭代中)。
Lastly, this is more like a regression problem, not classification.最后,这更像是一个回归问题,而不是分类问题。 So accuracy over test data shouldn't be evaluated by EXACT COMPARISON of prediction and ground truth.因此,不应通过预测和真实情况的精确比较来评估测试数据的准确性。 Instead, you should use some metric to compute distance between prediction and ground truth and target to reduce the distance by training the NN.相反,您应该使用一些度量来计算预测与基本事实和目标之间的距离,以通过训练 NN 来减少距离。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.