简体   繁体   中英

Train each “head” of a multi-output neural network independtly

I am trying to train a model where a shared feature extractor is used and then splited into n "heads" consisting of small layers to produce different outputs.

When I train the head "a" first everything works fine, but when I switch to head "b" python throws an InvalidArgumentError from tensorflow. It the same when I start with head "b" and then train head "a".

I tried to follow different approaches found on stackoverflow like this one but it didn't work.

I am building my model as follows


inputs =Input(shape=(state_shape[0],state_shape[1],state_shape[2]))
outputs=LocallyConnected2D(1, (6,6), activation='linear', padding='valid')(outputs) 





model1= Model(inputs=inputs, outputs=outputs1)
model2= Model(inputs=inputs, outputs=outputs2)
model3= Model(inputs=inputs, outputs=outputs3)

model1.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

model2.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

model3.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

And then I train them using the fit method.

if I run model1.fit(...) , for example, it works but then, when I run model2.fit(...) or model3.fit(...) , I got an error message :

W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: You must feed a value for placeholder tensor 'activation_1_target' with dtype float
         [[Node: activation_1_target = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'activation_1_target' with dtype float
         [[Node: activation_1_target = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
         [[Node: dense_5/bias/read/_1075 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_60_dense_5/bias/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'activation_1_target', defined at:
  File "main.py", line 100, in <module>
  File "/dds/work/DQL/dql_last_version/8th_code_multi/agent_per.py", line 225, in init_brain
    self.brain = Brain_2D(self.state_shape,self.action_number)
  File "/dds/work/DQL/dql_last_version/8th_code_multi/brain.py", line 141, in __init__
    Brain.__init__(self, action_number)
  File "/dds/work/DQL/dql_last_version/8th_code_multi/brain.py", line 20, in __init__
    self.models, self.full_model = self._create_model()
  File "/dds/work/DQL/dql_last_version/8th_code_multi/brain.py", line 216, in _create_model
    neuralNet1.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0))
  File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/keras/engine/training.py", line 755, in compile
  File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 497, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
  File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1502, in placeholder
  File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2149, in _placeholder
  File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
  File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'activation_1_target' with dtype float
         [[Node: activation_1_target = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
         [[Node: dense_5/bias/read/_1075 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_60_dense_5/bias/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I want to optimize the weights only on the head that I chose, but it seems that once some inputs have taken a path through the network, it is waiting for me to pass again trough the same head. Even if I want to train the other weights.

I thought of building only one model with several outputs

model= Model(inputs=inputs, outputs=[outputs1,outputs2,outputs3,outputs4]) 

but I want each head to be train on a different batch of data (I am working on a reinforcement learning project).

Thank you !

I resolved my problem.

I ended up compiling only one model but with n inputs and n outputs, with n the number of heads. I give to each input associate with a different batch so that they can train each head with different data distribution.

For the test part I just duplicate the same input n times and feed it to the model. It's maybe not the best way to do it but it works.

If you have thoughts or comments to make about my solution don't hesitate, I would be glad to see other approaches.

Thank you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM