Run failed: User program failed with ModuleNotFoundError: No module named 'amlrun' in Azure ML Experiment

Question

I'm using VS Code to submit a Machine Learning experiment in Azure Portal. When running the experiment I'm obtaining the following error:

Run failed: User program failed with ModuleNotFoundError: No module named 'amlrun'

This is the code structure:

.vscode (json configuration file)

aml_config

scripts

----- amlrun.py (a script with some functions)

----- model_training.py (a script creating and saving the model)

This is the configuration file:

{
    "script": "model_training.py",
    "framework": "Python",
    "communicator": "None",
    "target": "testazure",
    "environment": {
        "python": {
            "userManagedDependencies": false,
            "condaDependencies": {
                "dependencies": [
                    "python=3.6.2",
                    "scikit-learn",
                    "numpy",
                    "pandas",
                    {
                        "pip": [
                            "azureml-defaults"
                        ]
                    }
                ]
            }
        },
        "docker": {
            "baseImage": "mcr.microsoft.com/azureml/base:0.2.4",
            "enabled": true,
            "baseImageRegistry": {
                "address": null,
                "username": null,
                "password": null
            }
        }
    },
    "history": {
        "outputCollection": true,
        "snapshotProject": false,
        "directoriesToWatch": [
            "logs"
        ]
    }
}

Am I missing something? Thanks

Answer 1

When your training script is running in azure, it's not able to find all your local imports ie amlrun.py script.

The submitted training job to azure builds a docker image with your files first and runs the experiment; but in this case the extension hasn't included amlrun.py .

This is probably because when you have submit the training job with the extension, the visual studio code window opened is not pointing to be in scripts folder.

Taken from one of the replies to a previously raised github issue :

The extension currently requires the script you are working on to be in the folder that is open in VS Code and not in a sub-directory.

To fix this you can do either of the following:

You would need to re-open Visual Studio Code in scripts folder instead of parent directory.
Move all files in script directory to be in it's parent directory.

If you're looking for more flexible way to submit training jobs and managing aml - you can use the azure machine learning sdk for python.

Some examples of using the SDK to manage expirements can be found in the links below:

Run failed: User program failed with ModuleNotFoundError: No module named 'amlrun' in Azure ML Experiment

Question

1 answers

solution1
0 2019-10-23 13:25:18

Run failed: User program failed with ModuleNotFoundError: No module named 'amlrun' in Azure ML Experiment

Question

1 answers

solution1 0 2019-10-23 13:25:18

solution1
0 2019-10-23 13:25:18