简体   繁体   English

权重和偏差扫描无法导入带有 pytorch 闪电的模块

[英]Weights & Biases sweep cannot import modules with pytorch lightning

I am training a variational autoencoder, using pytorch-lightning.我正在使用 pytorch-lightning 训练变分自动编码器。 My pytorch-lightning code works with a Weights and Biases logger.我的 pytorch-lightning 代码与权重和偏差记录器一起使用。 I am trying to do a parameter sweep using a W&B parameter sweep.我正在尝试使用 W&B 参数扫描进行参数扫描。

The hyperparameter search procedure is based on what I followed from this repo.超参数搜索过程基于我从这个 repo 中遵循的内容。

The runs initialise correctly, but when my training script is run with the first set of hyperparameters, i get the following error:运行正确初始化,但是当我的训练脚本使用第一组超参数运行时,我收到以下错误:

2020-08-14 14:09:07,109 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python train_sweep.py --LR=0.02537477586974176
Traceback (most recent call last):
  File "train_sweep.py", line 1, in <module>
    import yaml
ImportError: No module named yaml

yaml is installed and is working correctly. yaml已安装并且工作正常。 I can train the network by setting the parameters manually, but not with the parameter sweep.我可以通过手动设置参数来训练网络,但不能使用参数扫描。

Here is my sweep script to train the VAE:这是我训练 VAE 的扫描脚本:

import yaml
import numpy as np
import ipdb
import torch
from vae_experiment import VAEXperiment
import torch.backends.cudnn as cudnn
from pytorch_lightning import Trainer
from pytorch_lightning.loggers import WandbLogger
from pytorch_lightning.callbacks import EarlyStopping
from vae_network import VanillaVAE
import os
import wandb
from utils import get_config, log_to_wandb

# Sweep parameters
hyperparameter_defaults = dict(
    root='data_semantics',
    gpus=1,
    batch_size = 2,
    lr = 1e-3,
    num_layers = 5,
    features_start = 64,
    bilinear = False,
    grad_batches = 1,
    epochs = 20
)

wandb.init(config=hyperparameter_defaults)
config = wandb.config

def main(hparams):

    model = VanillaVAE(hparams['exp_params']['img_size'], **hparams['model_params'])
    model.build_layers()
    experiment = VAEXperiment(model, hparams['exp_params'], hparams['parameters'])

    logger = WandbLogger(
        project='vae',
        name=config['logging_params']['name'],
        version=config['logging_params']['version'],
        save_dir=config['logging_params']['save_dir']
        )

    wandb_logger.watch(model.net)

    early_stopping = EarlyStopping(
       monitor='val_loss',
       min_delta=0.00,
       patience=3,
       verbose=False,
       mode='min'
    )

    runner = Trainer(weights_save_path="../../Logs/",
     min_epochs=1,
     logger=logger,
     log_save_interval=10,
     train_percent_check=1.,
     val_percent_check=1.,
     num_sanity_val_steps=5,
     early_stop_callback = early_stopping,
     **config['trainer_params']
     )

    runner.fit(experiment)

if __name__ == '__main__':
    main(config)

Why am I getting this error?为什么我会收到此错误?

The problem is that the structure of my code and the way that I was running the wandb commands was not in the correct order.问题是我的代码结构和运行 wandb 命令的方式不正确。 Looking at this pytorch-ligthning with wandb is the correct structure to follow.用 wandb 查看 这个 pytorch-ligthning wandb正确的结构。

Here is my refactored code:这是我重构的代码:

#!/usr/bin/env python
import wandb
from utils import get_config

#---------------------------------------------------------------------------------------------

def main():

    """
    The training function used in each sweep of the model.
    For every sweep, this function will be executed as if it is a script on its own.
    """

    import wandb
    import yaml
    import numpy as np
    import torch
    from vae_experiment import VAEXperiment
    import torch.backends.cudnn as cudnn
    from pytorch_lightning import Trainer
    from pytorch_lightning.loggers import WandbLogger
    from pytorch_lightning.callbacks import EarlyStopping
    from vae_network import VanillaVAE
    import os
    from utils import log_to_wandb, format_config

    path_to_config = 'sweep.yaml'
    config = get_config(path_to_yaml)

    path_to_defaults = 'defaults.yaml'
    param_defaults = get_config(path_to_defaults)

    wandb.init(config=param_defaults)

    config = format_config(config, wandb.config)
    model = VanillaVAE(config['meta']['img_size'], hidden_dims = config['hidden_dims'], latent_dim  = config['latent_dim'])
    model.build_layers()

    experiment = VAEXperiment(model, config)

    early_stopping = EarlyStopping(
       monitor='val_loss',
       min_delta=0.00,
       patience=3,
       verbose=False,
       mode='max'
    )

    runner = Trainer(weights_save_path=config['meta']['save_dir'],
        min_epochs=1,
        train_percent_check=1.,
        val_percent_check=1.,
        num_sanity_val_steps=5,
        early_stop_callback = early_stopping,
        **config['trainer_params'])

    runner.fit(experiment)
    log_to_wandb(config, runner, experiment, path_to_config)

#---------------------------------------------------------------------------------------------

path_to_yaml = 'sweep.yaml'
sweep_config = get_config(path_to_yaml)
sweep_id = wandb.sweep(sweep_config)
wandb.agent(sweep_id, function=main)

#---------------------------------------------------------------------------------------------

Do you launch python in your shell by typing python or python3 ?您是否通过键入pythonpython3在 shell 中启动 python ? Your script could be calling python 2 instead of python 3.您的脚本可能会调用 python 2 而不是 python 3。

If this is the case, you can explicitly tell wandb to use python 3. See this section of documentation , in particular "Running Sweeps with Python 3".如果是这种情况,您可以明确告诉 wandb 使用 python 3。请参阅文档的这一部分,特别是“使用 Python 3 运行扫描”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 变压器和 PyTorch 的权重和偏差? - Weights & Biases with Transformers and PyTorch? 权重和偏差扫描 Keras K 折验证 - Weights&Biases Sweep Keras K-Fold Validation 来自特定神经元的权重和偏差的 pytorch 访问 - pytorch acess of weights and biases from a spcecific neuron 在PyTorch中,默认情况下如何初始化图层权重和偏差? - In PyTorch how are layer weights and biases initialized by default? 错误然后导入pytorch-lightning,azure notebook - error then import pytorch-lightning, azure notebook 无法在 google colab 上导入 pytorch_lightning - Unable to import pytorch_lightning on google colab pytorch 闪电“得到了一个意外的关键字参数'weights_summary'” - pytorch lightning "got an unexpected keyword argument 'weights_summary'" Pytorch Lightning Trainer 是否使用验证数据来优化模型权重? - Does the Pytorch Lightning Trainer use the validation data to optimize the models weights? 权重和偏差在张量流中不更新 - Weights and Biases not updating in tensorflow RuntimeError: Error(s) in loading state_dict for Generator: 使用 Pytorch 的权重和偏差大小不匹配 - RuntimeError: Error(s) in loading state_dict for Generator: size mismatch for weights and biases using Pytorch
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM