泡菜蟒蛇烤寬面條模型

Question

我在這里收到了一個簡單的長期短期記憶（lstm）模型，其中包括以下內容： https ： //github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py

這是架構：

l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))

# We now build the LSTM layer which takes l_in as the input layer
# We clip the gradients at GRAD_CLIP to prevent the problem of exploding gradients. 

l_forward_1 = lasagne.layers.LSTMLayer(
    l_in, N_HIDDEN, grad_clipping=GRAD_CLIP,
    nonlinearity=lasagne.nonlinearities.tanh)

l_forward_2 = lasagne.layers.LSTMLayer(
    l_forward_1, N_HIDDEN, grad_clipping=GRAD_CLIP,
    nonlinearity=lasagne.nonlinearities.tanh)

# The l_forward layer creates an output of dimension (batch_size, SEQ_LENGTH, N_HIDDEN)
# Since we are only interested in the final prediction, we isolate that quantity and feed it to the next layer. 
# The output of the sliced layer will then be of size (batch_size, N_HIDDEN)
l_forward_slice = lasagne.layers.SliceLayer(l_forward_2, -1, 1)

# The sliced output is then passed through the softmax nonlinearity to create probability distribution of the prediction
# The output of this stage is (batch_size, vocab_size)
l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)

# Theano tensor for the targets
target_values = T.ivector('target_output')

# lasagne.layers.get_output produces a variable for the output of the net
network_output = lasagne.layers.get_output(l_out)

# The loss function is calculated as the mean of the (categorical) cross-entropy between the prediction and target.
cost = T.nnet.categorical_crossentropy(network_output,target_values).mean()

# Retrieve all parameters from the network
all_params = lasagne.layers.get_all_params(l_out)

# Compute AdaGrad updates for training
print("Computing updates ...")
updates = lasagne.updates.adagrad(cost, all_params, LEARNING_RATE)

# Theano functions for training and computing cost
print("Compiling functions ...")
train = theano.function([l_in.input_var, target_values], cost, updates=updates, allow_input_downcast=True)
compute_cost = theano.function([l_in.input_var, target_values], cost, allow_input_downcast=True)

# In order to generate text from the network, we need the probability distribution of the next character given
# the state of the network and the input (a seed).
# In order to produce the probability distribution of the prediction, we compile a function called probs. 

probs = theano.function([l_in.input_var],network_output,allow_input_downcast=True)

並通過以下方式培訓模型：

for it in xrange(data_size * num_epochs / BATCH_SIZE):
        try_it_out() # Generate text using the p^th character as the start. 

        avg_cost = 0;
        for _ in range(PRINT_FREQ):
            x,y = gen_data(p)

            #print(p)
            p += SEQ_LENGTH + BATCH_SIZE - 1 
            if(p+BATCH_SIZE+SEQ_LENGTH >= data_size):
                print('Carriage Return')
                p = 0;


            avg_cost += train(x, y)
        print("Epoch {} average loss = {}".format(it*1.0*PRINT_FREQ/data_size*BATCH_SIZE, avg_cost / PRINT_FREQ))

如何保存模型，以便我不需要再次訓練？ 使用scikit我通常只是挑選模型對象。 然而，我不清楚與Theano /烤寬面條的類似過程。

Answer 1

您可以使用numpy保存權重：

np.savez('model.npz', *lasagne.layers.get_all_param_values(network_output))

然后再加載它們，如下所示：

with np.load('model.npz') as f:
     param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network_output, param_values)

資料來源： https ： //github.com/Lasagne/Lasagne/blob/master/examples/mnist.py

至於模型定義本身：在設置預訓練權重之前，一個選項肯定是保留代碼並重新生成網絡。

Answer 2

您可以通過Pickle保存模型參數和模型

import cPickle as pickle
import os
#save the network and its parameters as a dictionary
netInfo = {'network': network, 'params': lasagne.layers.get_all_param_values(network)}
Net_FileName = 'LSTM.pkl'
# save the dictionary as a .pkl file
pickle.dump(netInfo, open(os.path.join(/path/to/a/folder/, Net_FileName), 'wb'),protocol=pickle.HIGHEST_PROTOCOL)

保存模型后，可以通過pickle.load檢索它：

net = pickle.load(open(os.path.join(/path/to/a/folder/,Net_FileName),'rb'))
all_params = net['params']
lasagne.layers.set_all_param_values(net['network'], all_params)

Answer 3

我使用dill和numpy.savez函數一起成功了：

import dill as pickle

...
np.savez('model.npz', *lasagne.layers.get_all_param_values(network))
with open('model.dpkl','wb') as p_output:
   pickle.dump(network, p_output)

要導入pickle模型：

with open('model.dpkl', 'rb') as p_input:
    network = pickle.load(p_input)

with np.load('model.npz') as f:
    param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network, param_values)

泡菜蟒蛇烤寬面條模型

問題描述

3 個解決方案

解決方案1
14 2015-12-17 22:24:46

解決方案2
1 2016-10-06 22:36:55

解決方案3
0 2016-09-26 22:43:59

泡菜蟒蛇烤寬面條模型

問題描述

3 個解決方案

解決方案1 14 2015-12-17 22:24:46

解決方案2 1 2016-10-06 22:36:55

解決方案3 0 2016-09-26 22:43:59

解決方案1
14 2015-12-17 22:24:46

解決方案2
1 2016-10-06 22:36:55

解決方案3
0 2016-09-26 22:43:59