简体   繁体   中英

Azure Mlflow Run id not found using Python SDK azure-ai-ml v2

I'm using Azure ML jobs to run an experiment using python sdk-v2, and I haven't be able to access into the run logs after the run is completed. I'm not sure what is happening, if I'm missing some permission or a previous step. It just says "run 'xxxx' not found

from mlflow.tracking import MlflowClient

# Use MlFlow to retrieve the job that was just completed
run_id = 'musing_steelpan_xxxx'

finished_mlflow_run = MlflowClient().get_run(run_id)

. The run_id actually exist, I'm the owner of the worspace and cluster.

MlflowException                           Traceback (most recent call last)
Cell In [5], line 6
      3 # Use MlFlow to retrieve the job that was just completed
      4 run_id = 'musing_steelpan_hnlbhxf9qy'
----> 6 finished_mlflow_run = MlflowClient().get_run(run_id)

File /miniconda/envs/benchmark/lib/python3.8/site-packages/mlflow/tracking/client.py:150, in MlflowClient.get_run(self, run_id)
    112 def get_run(self, run_id: str) -> Run:
    113     """
    114     Fetch the run from backend store. The resulting :py:class:`Run <mlflow.entities.Run>`
    115     contains a collection of run metadata -- :py:class:`RunInfo <mlflow.entities.RunInfo>`,
   (...)
    148         status: FINISHED
    149     """
--> 150     return self._tracking_client.get_run(run_id)

File /miniconda/envs/benchmark/lib/python3.8/site-packages/mlflow/tracking/_tracking_service/client.py:72, in TrackingServiceClient.get_run(self, run_id)
     58 """
     59 Fetch the run from backend store. The resulting :py:class:`Run <mlflow.entities.Run>`
     60 contains a collection of run metadata -- :py:class:`RunInfo <mlflow.entities.RunInfo>`,
  (...)
     69          raises an exception.
     70 """
     71 _validate_run_id(run_id)
   ...
    648     )
    649 run_info = self._get_run_info_from_dir(run_dir)
    650 if run_info.experiment_id != exp_id:

MlflowException: Run 'musing_steelpan_xxxx' not found

In some cases (eg for jobs inside a pipeline, jobs inside a sweep), the display_name shown at the top in the portal (which can be changed by the user) is not the same as the name of the job (which is immutable) and shown further down in the portal (see image below).

在此处输入图像描述

Did you take the name or the display_name from the portal (or are they the same)?

Here is another idea: You might not be connected to the right workspace. You set the workspace by means of the MLFLOW_TRACKING_URI or as a direct parameter to the mlflow client. Try going to the Azure Portal and look at the workspace properties -- there you find the MLFLow Tracking URI for the workspace:

在此处输入图像描述

Then you can plug it into the code below -- this should print out 100 runs of your workspace (I believe the first 100...):

client = mlflow.tracking.MlflowClient(tracking_uri="<your mlflow tracking uri>")
runs = client.search_runs(experiment_ids=[])
for run in runs:
    print(run.info.run_uuid)

For the above code to work you need to:

  1. pip install mlflow azureml-mlflow azureml-core
  2. az login

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM