I am trying to train a model where a shared feature extractor is used and then splited into n "heads" consisting of small layers to produce different outputs.
When I train the head "a" first everything works fine, but when I switch to head "b" python throws an InvalidArgumentError
from tensorflow. It the same when I start with head "b" and then train head "a".
I tried to follow different approaches found on stackoverflow like this one but it didn't work.
I am building my model as follows
alphaLeaky=0.3
inputs =Input(shape=(state_shape[0],state_shape[1],state_shape[2]))
outputs=ZeroPadding2D(padding=(1,1))(inputs)
outputs=LocallyConnected2D(1, (6,6), activation='linear', padding='valid')(outputs)
outputs=Flatten()(outputs)
outputs=Dense(768,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs)
outputs=advanced_activations.LeakyReLU(alpha=alphaLeaky)(outputs)
outputs=Dense(512,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs)
outputs=advanced_activations.LeakyReLU(alpha=alphaLeaky)(outputs)
outputs1=Dense(256,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs)
outputs1=advanced_activations.LeakyReLU(alpha=alphaLeaky)(outputs1)
outputs1=Dense(action_number,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs1)
outputs1=Activation('linear')(outputs1)
outputs2=Dense(256,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs)
outputs2=advanced_activations.LeakyReLU(alpha=alphaLeaky)(outputs2)
outputs2=Dense(action_number,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs2)
outputs2=Activation('linear')(outputs2)
outputs3=Dense(256,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs)
outputs3=advanced_activations.LeakyReLU(alpha=alphaLeaky)(outputs3)
outputs3=Dense(action_number,kernel_initializer='lecun_uniform',bias_initializer='zeros')(outputs3)
outputs3=Activation('linear')(outputs3)
model1= Model(inputs=inputs, outputs=outputs1)
model2= Model(inputs=inputs, outputs=outputs2)
model3= Model(inputs=inputs, outputs=outputs3)
model1.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model2.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model3.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
And then I train them using the fit method.
if I run model1.fit(...)
, for example, it works but then, when I run model2.fit(...)
or model3.fit(...)
, I got an error message :
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: You must feed a value for placeholder tensor 'activation_1_target' with dtype float
[[Node: activation_1_target = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'activation_1_target' with dtype float
[[Node: activation_1_target = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
[[Node: dense_5/bias/read/_1075 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_60_dense_5/bias/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op 'activation_1_target', defined at:
File "main.py", line 100, in <module>
agent.init_brain()
File "/dds/work/DQL/dql_last_version/8th_code_multi/agent_per.py", line 225, in init_brain
self.brain = Brain_2D(self.state_shape,self.action_number)
File "/dds/work/DQL/dql_last_version/8th_code_multi/brain.py", line 141, in __init__
Brain.__init__(self, action_number)
File "/dds/work/DQL/dql_last_version/8th_code_multi/brain.py", line 20, in __init__
self.models, self.full_model = self._create_model()
File "/dds/work/DQL/dql_last_version/8th_code_multi/brain.py", line 216, in _create_model
neuralNet1.compile(loss='mse', optimizer=Adamax(lr=PAS_INITIAL, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0))
File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/keras/engine/training.py", line 755, in compile
dtype=K.dtype(self.outputs[i]))
File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 497, in placeholder
x = tf.placeholder(dtype, shape=shape, name=name)
File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1502, in placeholder
name=name)
File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2149, in _placeholder
name=name)
File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/dds/miniconda/envs/dds/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'activation_1_target' with dtype float
[[Node: activation_1_target = Placeholder[dtype=DT_FLOAT, shape=[], _device="/job:localhost/replica:0/task:0/gpu:0"]()]]
[[Node: dense_5/bias/read/_1075 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_60_dense_5/bias/read", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
I want to optimize the weights only on the head that I chose, but it seems that once some inputs have taken a path through the network, it is waiting for me to pass again trough the same head. Even if I want to train the other weights.
I thought of building only one model with several outputs
model= Model(inputs=inputs, outputs=[outputs1,outputs2,outputs3,outputs4])
but I want each head to be train on a different batch of data (I am working on a reinforcement learning project).
Thank you !
I resolved my problem.
I ended up compiling only one model but with n inputs and n outputs, with n the number of heads. I give to each input associate with a different batch so that they can train each head with different data distribution.
For the test part I just duplicate the same input n times and feed it to the model. It's maybe not the best way to do it but it works.
If you have thoughts or comments to make about my solution don't hesitate, I would be glad to see other approaches.
Thank you
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.