简体   繁体   English

Tensorflow2 For循环图执行Valuerror

[英]Tensorflow2 For loop Graph Execution Valuerror

I am using TF2 for hyperparameter optimization.我正在使用 TF2 进行超参数优化。 For example, I define a range of learning rates lr= [0.0001, 0.001, 0.01] to pass into a Trainer function which includes a custom training loop (using Gradientape ).例如,我定义了一个学习率范围lr= [0.0001, 0.001, 0.01]以传递给包含自定义训练循环(使用Gradientape )的Trainer器 function。 However, I met error when I use @tf.function .但是,我在使用@tf.function时遇到了错误。 My training structure is like this:我的训练结构是这样的:

def Trainer(lr):
    
    # Define the optimizer
    optim = tf.keras.optimizers.experimental.Nadam(learning_rate=lr)
    
    train_dataset, test_dataset, sample_size = dataset_load(arg)
    
    # define model
    model_config = {arg}
    net = Mymodel(model_config)
    step = 0
    with tqdm.tqdm(total=max_step, leave=True, desc='Training') as pbar:
        while step < max_step:
            for signal in train_dataset:
                
                # Calculate loss
                loss = train_batch(signal and other parameter)
    
                step += 1
                pbar.update()
                pbar.set_postfix(
                    {'loss': loss.numpy(),
                     'step': step})

The train_batch function is: train_batch function 是:

@tf.function 
def train_batch(signal, arg...):
    with tf.GradientTape() as tape:
        tape.watch(model.trainable_variables)
        loss = compute_loss([signal], model)
    grad = tape.gradient(loss, model.trainable_variables,
                         unconnected_gradients=tf.UnconnectedGradients.ZERO)
    optim.apply_gradients(
        zip(grad, model.trainable_variables))
    del grad

    return loss

For the outer loop, I define the lr , then use: for lr_current in lr: Trainer(lr)对于外循环,我定义了lr ,然后使用: for lr_current in lr: Trainer(lr)

The program executes normally for the first lr_current .该程序对第一个lr_current正常执行。 But when the outer for loop goes to the second value of lr_current , the error comes out:但是当外层 for 循环转到lr_current的第二个值时,错误就出来了:

ValueError: tf.function only supports singleton tf.Variables created on the first call. ValueError: tf.function 仅支持第一次调用时创建的 singleton tf.Variables。 Make sure the tf.Variable is only created once or created outside tf.function.确保 tf.Variable 仅创建一次或在 tf.function 之外创建。

I do not understand why this error comes out.我不明白为什么会出现此错误。 I think it is related to the for loop of Trainer function. I also tried to del.net when finishing the training but it did not work.我觉得是和Trainer function的for循环有关,我也试过在训练结束的时候del.net但是没有用。 The program runs normally for all lr_current when I removed @tf.function .当我删除@tf.function时,程序对所有lr_current正常运行。

I have uploaded a minimal reproducible example to the Colab.我已经向 Colab 上传了一个最小的可重现示例 Could anyone help me out?谁能帮帮我? Thanks in advance!提前致谢!

I have done your work, copy this code and run it into your notebook.我已经完成了您的工作,复制此代码并将其运行到您的笔记本中。

def Trainer():

    # loss_func = tf.keras.losses.RootMeanSquaredError()


    train_dataset, test_dataset, sample_size = dataset_load(time_len=100, batch_size=16)

    epoch = 0
    step = 0
    with tqdm.tqdm(total=10, leave=True, desc='Training') as pbar:
        while epoch < 10:
            for signal in train_dataset:
                obs_signal, obs_mask, impute_mask = genmask(signal=signal, missing_ratio=0.2,
                                                            missing_type='rm')
                # Calculate loss
                loss = train_batch(signal=obs_signal, obs_mask=obs_mask,
                                   impute_mask=impute_mask)

                step+=1
                pbar.set_postfix(
                    {'loss': loss.numpy(),
                     'step': step,
                     'epoch': epoch})
                
            epoch += 1
            pbar.update()

def compute_loss(signal_mask: List, diff_params):
    obs_mask = signal_mask[1]
    impute_mask = signal_mask[2]
    # [B, T], [B, T]
    epsilon_theta, eps = diffusion(signal_mask, diff_params)
    # MSE loss
    target_mask = obs_mask - impute_mask
    residual = (epsilon_theta - eps) * target_mask

    loss = tf.reduce_sum(residual**2)/ (tf.reduce_sum(target_mask)
                                        if tf.reduce_sum(target_mask)>0 else 1.0)
    return loss

def diffusion(signal_mask: List, diff_params, eps=None):

    assert len(signal_mask) == 3

    signal = signal_mask[0]
    cond_mask = signal_mask[2]

    B, L, C = signal.shape[0], signal.shape[1], signal.shape[2]  # B is batchsize, C=1, L is signal length
    _dh = diff_params
    T, Alpha_bar = _dh["T"], _dh["Alpha_bar"]
    timesteps = tf.random.uniform(
        shape=[B, 1, 1], minval=0, maxval=T, dtype=tf.int32)  # [B], randomly sample diffusion steps from 1~T

    if eps is None:
        eps = tf.random.normal(tf.shape(signal))  # random noise

    extracted_alpha = tf.gather(Alpha_bar, timesteps)
    transformed_X = tf.sqrt(extracted_alpha) * signal + tf.sqrt(
        1 - extracted_alpha) * eps  # compute x_t from q(x_t|x_0)
    timesteps = tf.cast(timesteps, tf.float32)
    total_input = tf.stack([cond_mask * signal,
                            (1 - cond_mask) * transformed_X], axis=-1)  # B, L, K, 2
    obser_tp = tf.range(signal.shape[1])

    epsilon_theta = net(
        (total_input, obser_tp, cond_mask,
         tf.squeeze(timesteps, axis=-1)))  # predict \epsilon according to \epsilon_\theta

    return epsilon_theta, eps


def dataset_load(time_len, batch_size):

    train_data = np.random.randn(batch_size*10, time_len, 10)
    test_data = np.random.randn(batch_size, time_len, 10)
    shuffle_size_train = train_data.shape[0]

    train_dataset = tf.data.Dataset. \
        from_tensor_slices(train_data).shuffle(shuffle_size_train) \
        .batch(batch_size, drop_remainder=True)
    test_dataset = tf.convert_to_tensor(test_data)
    L = train_data.shape[-2]
    K = train_data.shape[-1]

    return (train_dataset, test_dataset, [L, K])

def genmask(signal: tf.Tensor, missing_ratio, missing_type):
    """Generate the mask
    Returns:
        observed_values (tf.Tensor): [B, T, K], [B, T, K], multivariate time series with K features
        observed_masks (tf.Tensor): [B, T, K], mask for observation points
        impute_mask (tf.Tensor): [B, T, K], mmask for imputation target
    """
    miss_ratio = missing_ratio
        
    observed_values = signal.numpy().astype(np.single)
    observed_mask = ~np.isnan(observed_values)

    rand_for_mask = np.random.rand(*observed_mask.shape) * observed_mask
    rand_for_mask = rand_for_mask.reshape(len(rand_for_mask), -1) # B, L*K
    for i in range(len(observed_mask)): # Loop for Batch
        sample_ratio = np.random.rand() if not missing_ratio else missing_ratio # missing ratio
        num_observed = observed_mask[i].sum()
        num_masked = round(num_observed * sample_ratio)
        rand_for_mask[i][np.argpartition(rand_for_mask[i], -num_masked)[-num_masked:]] = -1
    gt_masks = (rand_for_mask > 0).reshape(observed_mask.shape).astype(np.single)
    observed_mask = observed_mask.astype(np.single)
    
    return observed_values, observed_mask, gt_masks
@tf.function
def train_batch(signal, obs_mask, impute_mask):
    """Warpped training on a batch using static graph.
    Args:
        signal (tf.Tensor): [B, T, K], multivariate time series with K features
        obs_mask (tf.Tensor): [B, T, K], mask for observation points
        impute_mask (tf.Tensor): [B, T, K], mask for imputation target
    Returns:
        loss (float): average loss function of on a batch
    """
    with tf.GradientTape() as tape:
        tape.watch(net.trainable_variables)
        loss = compute_loss([signal, obs_mask, impute_mask],
                                 diffusion_hyperparams)

    grad = tape.gradient(loss, net.trainable_variables,
                         unconnected_gradients=tf.UnconnectedGradients.ZERO)
    optim.apply_gradients(
        zip(grad, net.trainable_variables))
    del grad

    return loss


lr = [0.001, 0.002, 0.01]
for lr_iter in lr:
    optim = tf.keras.optimizers.experimental.Nadam(lr_iter)
    model_config = { "res_channels": 64}
    net = mymodel(model_config)
    diffusion_hyperparams = calc_diffusion_hyperparams(T=50, beta_0=0.0001, beta_T=0.5, strategy="quadratic")
    Trainer()
    tf.keras.backend.clear_session()
    del net

Link to the Notebook链接到笔记本

https://colab.research.google.com/drive/1uh3-q3hM4obKLbh93sfT25zoUbK4jfUJ?usp=sharing https://colab.research.google.com/drive/1uh3-q3hM4obKLbh93sfT25zoUbK4jfUJ?usp=sharing

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM