简体   繁体   English

在张量流中微调神经网络

[英]Fine-tuning a neural network in tensorflow

I've been working on this neural network with the intent to predict TBA (time based availability) of simulated windmill parks based on certain attributes. 我一直在研究此神经网络,旨在根据某些属性来预测模拟风车公园的TBA(基于时间的可用性)。 The neural network runs just fine, and gives me some predictions, however I'm not quite satisfied with the results. 神经网络运行得很好,并给了我一些预测,但是我对结果并不满意。 It fails to notice some very obvious correlations that I can clearly see by myself. 它没有注意到我自己可以清楚看到的一些非常明显的相关性。 Here is my current code: 这是我当前的代码:

`# Import
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler

maxi = 0.96 
mini = 0.7 


# Make data a np.array
data = pd.read_csv('datafile_ML_no_avg.csv')
data = data.values

# Shuffle the data
shuffle_indices = np.random.permutation(np.arange(len(data)))
data = data[shuffle_indices]

# Training and test data
data_train = data[0:int(len(data)*0.8),:]
data_test = data[int(len(data)*0.8):int(len(data)),:]

# Scale data
scaler = MinMaxScaler(feature_range=(mini, maxi))
scaler.fit(data_train)
data_train = scaler.transform(data_train)
data_test = scaler.transform(data_test)


# Build X and y
X_train = data_train[:, 0:5]
y_train = data_train[:, 6:7]
X_test = data_test[:, 0:5]
y_test = data_test[:, 6:7]

# Number of stocks in training data
n_args = X_train.shape[1]
multi = int(8)
# Neurons
n_neurons_1 = 8*multi
n_neurons_2 = 4*multi
n_neurons_3 = 2*multi
n_neurons_4 = 1*multi

# Session
net = tf.InteractiveSession()

# Placeholder
X = tf.placeholder(dtype=tf.float32, shape=[None, n_args])
Y = tf.placeholder(dtype=tf.float32, shape=[None,1])

# Initialize1s
sigma = 1
weight_initializer = tf.variance_scaling_initializer(mode="fan_avg",                             
distribution="uniform", scale=sigma)
bias_initializer = tf.zeros_initializer()

# Hidden weights
W_hidden_1 = tf.Variable(weight_initializer([n_args, n_neurons_1]))
bias_hidden_1 = tf.Variable(bias_initializer([n_neurons_1]))
W_hidden_2 = tf.Variable(weight_initializer([n_neurons_1, n_neurons_2]))
bias_hidden_2 = tf.Variable(bias_initializer([n_neurons_2]))
W_hidden_3 = tf.Variable(weight_initializer([n_neurons_2, n_neurons_3]))
bias_hidden_3 = tf.Variable(bias_initializer([n_neurons_3]))
W_hidden_4 = tf.Variable(weight_initializer([n_neurons_3, n_neurons_4]))
bias_hidden_4 = tf.Variable(bias_initializer([n_neurons_4]))

# Output weights
W_out = tf.Variable(weight_initializer([n_neurons_4, 1]))
bias_out = tf.Variable(bias_initializer([1]))

# Hidden layer
hidden_1 = tf.nn.relu(tf.add(tf.matmul(X, W_hidden_1), bias_hidden_1))
hidden_2 = tf.nn.relu(tf.add(tf.matmul(hidden_1, W_hidden_2),         
bias_hidden_2))
hidden_3 = tf.nn.relu(tf.add(tf.matmul(hidden_2, W_hidden_3),     
bias_hidden_3))
hidden_4 = tf.nn.relu(tf.add(tf.matmul(hidden_3, W_hidden_4), 
bias_hidden_4))

# Output layer (transpose!)
out = tf.transpose(tf.add(tf.matmul(hidden_4, W_out), bias_out))

# Cost function
mse = tf.reduce_mean(tf.squared_difference(out, Y))

# Optimizer
opt = tf.train.AdamOptimizer().minimize(mse)

# Init
net.run(tf.global_variables_initializer())

# Fit neural net
batch_size = 10
mse_train = []
mse_test = []

# Run
epochs = 10
for e in range(epochs):

# Shuffle training data
shuffle_indices = np.random.permutation(np.arange(len(y_train)))
X_train = X_train[shuffle_indices]
y_train = y_train[shuffle_indices]

# Minibatch training
for i in range(0, len(y_train) // batch_size):
    start = i * batch_size
    batch_x = X_train[start:start + batch_size]
    batch_y = y_train[start:start + batch_size]
    # Run optimizer with batch
    net.run(opt, feed_dict={X: batch_x, Y: batch_y})

    # Show progress
    if np.mod(i, 50) == 0:


        mse_train.append(net.run(mse, feed_dict={X: X_train, Y: y_train}))
        mse_test.append(net.run(mse, feed_dict={X: X_test, Y: y_test}))

        pred = net.run(out, feed_dict={X: X_test})

print(pred)`

Have tried to tweak around with the number of hidden layers, number of nodes per layer, number of epochs to run and trying different activation functions and optimizers. 试图调整隐藏层的数量,每层节点的数量,要运行的时期的数量,并尝试使用不同的激活功能和优化器。 However, I am quite new to neural networks, so there might be something very obvious that I'm missing. 但是,我对神经网络还是很陌生,所以可能有些明显的东西我很想念。

Thanks in advance to anyone who managed to read through all of that. 在此先感谢所有设法阅读所有内容的人。

It will make is much easier you you will share a small dataset that illustrate the problem. 您将共享一个小的数据集来说明问题,这将更加容易。 However, I will state some of the issues with non-standards datasets and how to overcome them. 但是,我将陈述非标准数据集的一些问题以及如何解决它们。

Possible solutions 可能的解决方案

  1. Regularization and validation-based optimization - are methods that are always good to try when looking for some extra-accuracy. 正则化和基于验证的优化 -在寻找一些额外精度时总是可以尝试的方法。 See dropout methods here (original paper), and some overview here . 请参阅此处的辍学方法(原始论文),以及此处的一些概述。

  2. Unbalanced data - Sometimes of the time series categories/events behave like anomalies, or just in unbalanced ways. 数据不平衡 -有时时间序列中的类别/事件表现得异常,或只是不平衡。 If you read a book, words like the or it will appear much more times than warehouse or such. 如果您读一本书,诸如或单词会比仓库之单词出现更多的时间。 This can become a problem if your main task is to detect the word warehouse and you train your network (even lstms) in traditional ways. 如果您的主要任务是检测单词仓库,然后以传统方式训练网络(甚至是lstms),则可能会遇到问题。 A way to overcome this problem is by balancing the samples (creating balanced datasets) or to give more weight to low-frequent categories. 解决此问题的一种方法是通过平衡样本(创建平衡的数据集)或为低频类别赋予更多权重。

  3. Model structure - sometimes fully connected layers are not enough. 模型结构 -有时完全连接的层还不够。 See computer vision problems for instance, where we train using convolution layers. 例如,查看计算机视觉问题,我们在其中使用卷积层进行训练。 The convolution and pooling layers enforce structure on the model, which is suitable for images. 卷积和池化层在模型上强制执行结构,适用于图像。 This is also some sort of regulation, since we have less parameters in those layers. 这也是某种规则,因为在这些层中我们拥有较少的参数。 In time-series problems, convolutions are also possible and turns out that works just fine. 在时间序列问题中,卷积也是可能的,事实证明这很好。 See example in Conditional Time Series Forecasting with Convolution Neural Networks . 参见“ 使用卷积神经网络进行条件时间序列预测”中的示例。

The above suggestions are presented in the order I would suggest to try. 以上建议是按照我建议尝试的顺序提出的。

Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM