简体   繁体   English

Tensorflow - 从恢复的模型中平均模型权重

[英]Tensorflow - Averaging model weights from restored models

Given that I trained several different models on the same data and all the neural networks I trained have the same architecture I would like to know if it's possible to restore those models, average their weights and initialise my weights using the average.鉴于我在相同的数据上训练了几个不同的模型,并且我训练的所有神经网络都具有相同的架构,我想知道是否有可能恢复这些模型,平均它们的权重并使用平均值初始化我的权重。

This is an example of how the graph might look.这是图表外观的示例。 Basically what I need is an average of the weights I am going to load.基本上我需要的是我要加载的重量的平均值。

import tensorflow as tf
import numpy as np

#init model1 weights
weights = {
    'w1': tf.Variable(),
    'w2': tf.Variable()
}
# init model1 biases
biases = {
    'b1': tf.Variable(),
    'b2': tf.Variable()
}
#init model2 weights
weights2 = {
    'w1': tf.Variable(),
    'w2': tf.Variable()
}
# init model2 biases
biases2 = {
    'b1': tf.Variable(),
    'b2': tf.Variable(),
}

# this the average I want to create
w = {
    'w1': tf.Variable(
        tf.add(weights["w1"], weights2["w1"])/2
    ),
    'w2': tf.Variable(
        tf.add(weights["w2"], weights2["w2"])/2
    ),
    'w3': tf.Variable(
        tf.add(weights["w3"], weights2["w3"])/2
    )
}
# init biases
b = {
    'b1': tf.Variable(
        tf.add(biases["b1"], biases2["b1"])/2
    ),
    'b2': tf.Variable(
        tf.add(biases["b2"], biases2["b2"])/2
    ),
    'b3': tf.Variable(
        tf.add(biases["b3"], biases2["b3"])/2
    )
}

weights_saver = tf.train.Saver({
    'w1' : weights['w1'],
    'w2' : weights['w2'],
    'b1' : biases['b1'],
    'b2' : biases['b2']
    })
weights_saver2 = tf.train.Saver({
    'w1' : weights2['w1'],
    'w2' : weights2['w2'],
    'b1' : biases2['b1'],
    'b2' : biases2['b2']
    })

And this what I am want to get when I run the tf session.这就是我在运行 tf 会话时想要得到的。 c contains the weights I want to use in order to start the training. c 包含我想要用于开始训练的权重。

# Create a session for running operations in the Graph.
init_op = tf.global_variables_initializer()
init_op2 = tf.local_variables_initializer()

with tf.Session() as sess:
    coord = tf.train.Coordinator()
    # Initialize the variables (like the epoch counter).
    sess.run(init_op)
    sess.run(init_op2)
    weights_saver.restore(
        sess,
        'my_model1/model_weights.ckpt'
    )
    weights_saver2.restore(
        sess,
        'my_model2/model_weights.ckpt'
    )
    a = sess.run(weights)
    b = sess.run(weights2)
    c = sess.run(w)

First, I assume the model structure is exactly the same (same number of layers, same number of nodes/layer).首先,我假设模型结构完全相同(相同的层数,相同的节点/层数)。 If not they you will have problems mapping variables (there will be variables in one model but not in the other.如果不是,您将在映射变量时遇到问题(一个模型中会有变量,而另一个模型中没有。

What you want to do is have 3 sessions.你想要做的是有3个会话。 First 2 you load from checkpoints, the last one will hold the average.从检查点加载的前 2 个,最后一个将保持平均值。 You want this because each session will contain a version of the values of the variables.您希望这样做是因为每个会话都将包含变量值的一个版本。

After you load a model use tf.trainable_variables() to get a list of all the variables in the model.加载模型后,使用tf.trainable_variables()获取模型中所有变量的列表。 You can pass it to sess.run to get the variables as numpy arrays.您可以将其传递给sess.run以将变量作为 numpy 数组获取。 After you compute the averages use tf.assign to create operations to change the variables.计算平均值后,使用 tf.assign 创建操作以更改变量。 You can also use the list to change the initializers, but that means passing in to the model (not always an option).您还可以使用列表来更改初始值设定项,但这意味着传递给模型(并不总是一个选项)。

Roughly:大致:

graph = tf.Graph()
session1 = tf.Session()
session2 = tf.Session()
session3 = tf.Session()

# Omitted code: Restore session1 and session2.
# Optionally initialize session3.

all_vars = tf.trainable_variables()
values1 = session1.run(all_vars)
values2 = session2.run(all_vars)

all_assign = []
for var, val1, val2 in zip(all_vars, values1, values2):
  all_assign.append(tf.assign(var, tf.reduce_mean([val1,val2], axis=0)))

session3.run(all_assign)

# Do whatever you want with session 3.

You can implement this in a very generic way for any checkpoint, any model, by using tf.train.list_variables and tf.train.load_checkpoint .您可以使用tf.train.list_variablestf.train.load_checkpoint以非常通用的方式为任何检查点、任何模型实现这tf.train.load_checkpoint

You find an example here .您可以在此处找到示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM