简体   繁体   English

如何获取在 Vertex AI 上运行的组件内运行的管道的状态?

[英]How to get the status of a pipeline run within a component, running on Vertex AI?

Previously, using Kubeflow Pipelines SDK v1, the status of a pipeline could be inferred during pipeline execution by passing an Argo placeholder, {{workflow.status}} , to the component, as shown below:以前,使用 Kubeflow Pipelines SDK v1,可以在管道执行期间通过将 Argo 占位符{{workflow.status}}传递给组件来推断管道的状态,如下所示:

import kfp.dsl as dsl

component_1 = dsl.ContainerOp(
    name='An example component',
    image='eu.gcr.io/.../my-component-img',
    arguments=[
               'python3', 'main.py',
               '--status', "{{workflow.status}}"
              ]
)

This placeholder would take the value Succeeded or Failed when passed to the component.当传递给组件时,此占位符将采用值SucceededFailed One use-case for this would be to send a failure-warning to eg.一个用例是向例如发送失败警告。 Slack, in combination with dsl.ExitHandler . Slack,结合dsl.ExitHandler

However, when using Pipeline SDK version 2, kfp.v2 , together with Vertex AI to compile and run the pipeline the Argo placeholders no longer work, as described by this open issue .但是,当使用 Pipeline SDK 版本 2、 kfp.v2和 Vertex AI 编译和运行管道时,Argo 占位符不再起作用,如本公开问题所述。 Because of this, I would need another way to check the status of the pipeline within the component.因此,我需要另一种方法来检查组件内管道的状态。 I was thinking I could use the kfp.Client class , but I'm assuming this won't work using Vertex AI, since there is no "host" really.我在想我可以使用kfp.Client ,但我假设这不会使用 Vertex AI,因为真的没有“主机”。 Also, there seems to be supported placeholders for to pass the run id ( dsl.PIPELINE_JOB_ID_PLACEHOLDER ) as a placeholder, as per this SO post , but I can't find anything around status .此外,似乎有支持的占位符用于将运行 ID( dsl.PIPELINE_JOB_ID_PLACEHOLDER )作为占位符传递,根据这个 SO post ,但我在status周围找不到任何东西。

Any ideas how to get the status of a pipeline run within a component, running on Vertex AI?任何想法如何获取在 Vertex AI 上运行的组件内运行的管道状态?

Each pipeline run is automatically logged to Google Logging, and so are also the failed pipeline runs.每个管道运行都会自动记录到 Google Logging,失败的管道运行也是如此。 The error logs also contain information about the pipeline and the component that failed.错误日志还包含有关失败的管道和组件的信息。

We can use this information to monitor our logs and set up an alert via email for example.例如,我们可以使用这些信息来监控我们的日志并通过电子邮件设置警报。

The logs for our Vertex AI Pipeline runs we get with the following filter我们使用以下过滤器获得 Vertex AI Pipeline 运行的日志

resource.type=”aiplatform.googleapis.com/PipelineJob” severity=(ERROR OR CRITICAL OR ALERT OR EMERGENCY) resource.type=”aiplatform.googleapis.com/PipelineJob” 严重性=(ERROR OR CRITICAL OR ALERT OR EMERGENCY)

Vertex AI Pipeline Logs Vertex AI 管道日志

Based on those logs you can set up log-based alertshttps://cloud.google.com/logging/docs/alerting/log-based-alerts .根据这些日志,您可以设置基于日志的警报https://cloud.google.com/logging/docs/alerting/log-based-alerts Notifications via email, Slack, SMS, and many more are possible.可以通过电子邮件、Slack、SMS 等方式进行通知。

source: https://medium.com/google-cloud/google-vertex-ai-the-easiest-way-to-run-ml-pipelines-3a41c5ed153来源: https ://medium.com/google-cloud/google-vertex-ai-the-easyest-way-to-run-ml-pipelines-3a41c5ed153

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM