简体   繁体   中英

Run failed: User program failed with ModuleNotFoundError: No module named 'amlrun' in Azure ML Experiment

I'm using VS Code to submit a Machine Learning experiment in Azure Portal. When running the experiment I'm obtaining the following error:

Run failed: User program failed with ModuleNotFoundError: No module named 'amlrun'

This is the code structure:

.vscode (json configuration file)

aml_config

scripts

----- amlrun.py (a script with some functions)

----- model_training.py (a script creating and saving the model)

This is the configuration file:

{
    "script": "model_training.py",
    "framework": "Python",
    "communicator": "None",
    "target": "testazure",
    "environment": {
        "python": {
            "userManagedDependencies": false,
            "condaDependencies": {
                "dependencies": [
                    "python=3.6.2",
                    "scikit-learn",
                    "numpy",
                    "pandas",
                    {
                        "pip": [
                            "azureml-defaults"
                        ]
                    }
                ]
            }
        },
        "docker": {
            "baseImage": "mcr.microsoft.com/azureml/base:0.2.4",
            "enabled": true,
            "baseImageRegistry": {
                "address": null,
                "username": null,
                "password": null
            }
        }
    },
    "history": {
        "outputCollection": true,
        "snapshotProject": false,
        "directoriesToWatch": [
            "logs"
        ]
    }
}

Am I missing something? Thanks

When your training script is running in azure, it's not able to find all your local imports ie amlrun.py script.

The submitted training job to azure builds a docker image with your files first and runs the experiment; but in this case the extension hasn't included amlrun.py .

This is probably because when you have submit the training job with the extension, the visual studio code window opened is not pointing to be in scripts folder.

Taken from one of the replies to a previously raised github issue :

The extension currently requires the script you are working on to be in the folder that is open in VS Code and not in a sub-directory.


To fix this you can do either of the following:

  1. You would need to re-open Visual Studio Code in scripts folder instead of parent directory.

  2. Move all files in script directory to be in it's parent directory.


If you're looking for more flexible way to submit training jobs and managing aml - you can use the azure machine learning sdk for python.

Some examples of using the SDK to manage expirements can be found in the links below:

  1. Scikit Learn Model Training Docs

  2. Basic Pytorch Model Training and Deployment Example Repo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM