简体   繁体   中英

How to log a confusion matrix to azureml platform using python

Hello Stackoverflowers,

I'm using azureml and I'm wondering if it is possible to log a confusion matrix of the xgboost model I'm training, together with the other metrics I'm already logging. Here's a sample of the code I'm using:

from azureml.core.model import Model
from azureml.core import Workspace
from azureml.core.experiment import Experiment
from azureml.core.authentication import ServicePrincipalAuthentication
import json

with open('./azureml.config', 'r') as f:
    config = json.load(f)

svc_pr = ServicePrincipalAuthentication(
   tenant_id=config['tenant_id'],
   service_principal_id=config['svc_pr_id'],
   service_principal_password=config['svc_pr_password'])


ws = Workspace(workspace_name=config['workspace_name'],
                        subscription_id=config['subscription_id'],
                        resource_group=config['resource_group'],
                        auth=svc_pr)

y_pred = model.predict(dtest)

acc = metrics.accuracy_score(y_test, (y_pred>.5).astype(int))
run.log("accuracy",  acc)
f1 = metrics.f1_score(y_test, (y_pred>.5).astype(int), average='binary')
run.log("f1 score",  f1)


cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
run.log_confusion_matrix('Confusion matrix', cmtx)

The above code raises this kind of error:

TypeError: Object of type ndarray is not JSON serializable

I already tried to transform the matrix in a simpler one, but another error occurred as before I logged a "manual" version of it ( cmtx = [[30000, 50],[40, 2000]] ).

run.log_confusion_matrix('Confusion matrix', [list([int(y) for y in x]) for x in cmtx])

AzureMLException: AzureMLException:
    Message: UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-    c5103b205379/Confusion matrix already exists.
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "UserError: Resource Conflict: ArtifactId ExperimentRun/dcid.3196bf92-4952-4850-9a8a-c5103b205379/Confusion matrix already exists."
    }
}

This makes me think that I'm not properly handling the command run.log_confusion_matrix() . So, again, which is the best way I can log a confusion matrix to my azureml experiments?

I eventually found a solution thanks to colleague of mine. I'm hence answering myself, in order to close the question and, maybe, help somebody else.

You can find the proper function in this link: https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run.run?view=azure-ml-py#log-confusion-matrix-name--value--description---- .

Anyway, you also have to consider that, apparently, Azure doesn't work with the standard confusion matrix format returned by sklearn. It accepts indeed ONLY list of list, instead of numpy array, populated with numpy.int64 elements. So you also have to transform the matrix in a simpler format (for the sake of simplicity I used the nested list comprehension in the command below:

cmtx = metrics.confusion_matrix(y_test,(y_pred>.5).astype(int))
cmtx = {

"schema_type": "confusion_matrix",
"parameters": params,
 "data": {"class_labels": ["0", "1"],
          "matrix": [[int(y) for y in x] for x in cmtx]}
}
run.log_confusion_matrix('Confusion matrix - error rate', cmtx)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM