繁体   English   中英

没有 Gym 的 Tensorflow 上的强化学习

[英]Reinforcement Learning on Tensorflow without Gym

我目前正在尝试为强化学习创建一个简单的 ANN 学习环境。 我已经通过神经网络进行了拟合,以用物理 model 代替神经网络。 现在,出于好奇,我想创建一个简单的强化学习 model。

To create this model I thought it would be a good option to manipulate the loss function to not calculate the difference between expectation and model output but to run a simple simulation a few rounds and calculate where the model can earn points for a specific target. 在下面的示例代码中,model 是一个简单的质量阻尼器系统,它以随机激励和速度开始。 model 可以对其施力。 这些点基于与平衡点的距离。 最后,我通过将 1 除以获得的点数来反转点数。 我不确定这是否是正确的方法,但为了学习,我还是想尝试一下。 现在我收到错误消息No gradients provided for any variable: 我不知道如何解决它。

这是我的代码:

import time
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Input, Dense, Conv2D, Reshape,concatenate, Flatten, UpSampling2D, AveragePooling2D,LayerNormalization
import random


#Physical Parameters
m = 1 #kg
k = 1 #N/m
c = 0.01
dt = 0.01 
opt = keras.optimizers.Adam(learning_rate=0.01)


def getnewstate(u,v,f):
    #Calculate new state of mass spring damper system
    a = (f-v*c-k*u)/m
    v = v+a*dt
    u = u+v*dt
    return (u,v)


def generatemodel():
    #Generate simple keras model
    kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01)
    bias_initializer=tf.keras.initializers.Zeros()
    InputLayer = Input(shape=(2))
    Outputlayer = Dense(1,activation='linear')(InputLayer)
    model = Model(inputs=InputLayer, outputs=Outputlayer)      
    
    return model
    
def lossfunction(u,v,model):
    #Costume loss function
    loss = 0;
    t    = 0;
    t_last = 0;
    #do for 100 timesteps (to ses if it runs at all)
    for j in range(100):

        x = [];
        x.append(np.array([u,v]))
        x = np.array(x)        
        f=model(x) 

        f=f.numpy()[0][0]

        (u,v) = getnewstate(u,v,f)

        points = 1000/(abs(u)+1)
        loss=loss+1/points
        t += dt;
    
    return(loss)
    
def dotraining(model):  
    #traububg loop
    for epoch in range(100):
        print("\nStart of epoch %d" % (epoch,))
        start_time = time.time()
        loss_value = 0;
        # Iterate over the batches of the dataset.
        for step in range(100):
            with tf.GradientTape() as tape:
                loss_value=[]
                for i in range(10):
                    #Randomize Starting Condition
                    u = random.random()-0.5;
                    v = random.random()-0.5;
                    x = [];
                    x.append(np.array([u,v]))
                    x = np.array(x)
                    #feed model
                    logits = model(x, training=True)
                    #calculate loss
                    loss_value.append(lossfunction(u,v,model))
                    
                    
                print(step)
            print(loss_value)
            loss = loss_value
            loss = tf.convert_to_tensor(loss)
            grads = tape.gradient(loss, model.trainable_weights)
            opt.apply_gradients(zip(grads, model.trainable_weights))

    
            # Log every 200 batches.
            if step % 200 == 0:
                print(
                    "Training loss (for one batch) at step %d: %.4f"
                    % (step, float(loss_value))
                )
                print("Seen so far: %d samples" % ((step + 1) * 64))   

        print("Time taken: %.2fs" % (time.time() - start_time))



model=generatemodel()
x = []
x.append(np.array([1.0,2.0]))
print(np.shape(x))
f=model(np.array(x))
dotraining(model)

问题是,当您在此处将 f 转换为 numpy 时:

f=f.numpy()[0][0]

它不再是张量,并且 tensorflow 不再跟踪它的梯度。

对于 tensorflow 计算梯度,您必须仅使用张量操作从输入到损失。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM