简体   繁体   中英

Adam optimizer between Tf 1 and Tf 2

I am trying to replicate the same result between Tf1 and Tf2. In below, there is a simple example using Adam optimizer.

Here in TF2:

x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.5, epsilon=1e-08)
optimizer.apply_gradients(zip([grad], [x]))
print(x)

x is: <tf.Variable 'Variable:0' shape=(3,) dtype=float32, numpy=array([0.49998665, 1.4999859, 2.4999857 ], dtype=float32)>

While in TF1:

x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.5)
optimizer.apply_gradients(zip([grad], [x]))

init_op = tf.initialize_all_variables()
with tf.Session() as sess:
  sess.run(init_op)
  print(sess.run(x))

x is: [1. 2. 3.]

Does anyone know what causes inconsistencies between Tf1 and Tf2 when using Adam Optimizer? I do not exclude the possibility of a wrong implementation.

I would appreciate it a lot if anyone could tell me what I am doing wrong in TF1 that I cannot get the same result as in TF2.

Many thanks!

If you instead do this:

x = tf.Variable([1,2,3], dtype=tf.float32)
grad = tf.constant([0.1, 0.2, 0.3])
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=0.5)
step = optimizer.apply_gradients(zip([grad], [x]))

init_op = tf.initialize_all_variables()
with tf.Session() as sess:
  sess.run(init_op)
  sess.run(step)
  print(x.eval())

You get the same result (barring what I think could be floating point inaccuracies).

[0.50000155 1.5000007  2.5000005 ]

Reproducibility is a tricky yet crucial step in commercial AI/ML projects.

Heres the v1 implementation of Adam on GH: https://github.com/tensorflow/tensorflow/blob/4c081973a6374ce867794ad66a5c4b204c310afb/tensorflow/python/keras/optimizer_v1.py#L468

And here's the v2 one: https://github.com/keras-team/keras/blob/v2.7.0/keras/optimizer_v2/adam.py

They are implemented slightly differently. I found this in the V2 documentation: Many optimizer subclasses, such as Adam and Adagrad allocate and manage additional variables associated with the variables to train. These are called Slots . Slots have names and you can ask the optimizer for the names of the slots that it uses. Once you have a slot name you can ask the optimizer for the variable it created to hold the slot value.

Also if you're trying to migrate code from 1 to 2 you can do it automatically as per https://www.tensorflow.org/guide/migrate .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM