How to log a confusion matrix to azureml platform using python

Question

Hello Stackoverflowers,

I'm using azureml and I'm wondering if it is possible to log a confusion matrix of the xgboost model I'm training, together with the other metrics I'm already logging. Here's a sample of the code I'm using:

from azureml.core.model import Model
from azureml.core import Workspace
from azureml.core.experiment import Experiment
from azureml.core.authentication import ServicePrincipalAuthentication
import json

with open('./azureml.config', 'r') as f:
    config = json.load(f)

svc_pr = ServicePrincipalAuthentication(
   tenant_id=config['tenant_id'],
   service_principal_id=config['svc_pr_id'],
   service_principal_password=config['svc_pr_password'])


ws = Workspace(workspace_name=config['workspace_name'],
                        subscription_id=config['subscription_id'],
                        resource_group=config['resource_group'],
                        auth=svc_pr)

y_pred = model.predict(dtest)

acc = metrics.accuracy_score(y_test, (y_pred>.5).astype(int))
run.log("accuracy",  acc)
f1 = metrics.f1_score(y_test, (y_pred>.5).astype(int), average='binary')
run.log("f1 score",  f1)


cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
run.log_confusion_matrix('Confusion matrix', cmtx)

The above code raises this kind of error:

TypeError: Object of type ndarray is not JSON serializable

I already tried to transform the matrix in a simpler one, but another error occurred as before I logged a "manual" version of it ( cmtx = [[30000, 50],[40, 2000]] ).

run.log_confusion_matrix('Confusion matrix', [list([int(y) for y in x]) for x in cmtx])

AzureMLException: AzureMLException:
    Message: UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-    c5103b205379/Confusion matrix already exists.
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-c5103b205379/Confusion matrix already exists."
    }
}

This makes me think that I'm not properly handling the command run.log_confusion_matrix() . So, again, which is the best way I can log a confusion matrix to my azureml experiments?

Answer 1

I eventually found a solution thanks to colleague of mine. I'm hence answering myself, in order to close the question and, maybe, help somebody else.

You can find the proper function in this link: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py#log-confusion-matrix-name--value--description---- .

Anyway, you also have to consider that, apparently, Azure doesn't work with the standard confusion matrix format returned by sklearn. It accepts indeed ONLY list of list, instead of numpy array, populated with numpy.int64 elements. So you also have to transform the matrix in a simpler format (for the sake of simplicity I used the nested list comprehension in the command below:

cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
cmtx = {

"schema_type": "confusion_matrix",
"parameters": params,
 "data": {"class_labels": ["0", "1"],
          "matrix": [[int(y) for y in x] for x in cmtx]}
}
run.log_confusion_matrix('Confusion matrix - error rate', cmtx)

How to log a confusion matrix to azureml platform using python

Question

1 answers

solution1
4 ACCPTED 2020-06-30 12:23:32

How to log a confusion matrix to azureml platform using python

Question

1 answers

solution1 4 ACCPTED 2020-06-30 12:23:32

solution1
4 ACCPTED 2020-06-30 12:23:32