如何将 ExitHandler 与 Kubeflow 管道一起使用 SDK v2

Question

我正在尝试将我所有的 Kubeflow 管道从使用以前的 SDK v1 ( kfp ) 转移到较新的管道 SDK v2 ( kfp.v2 )。 我正在使用1.8.12版。事实证明，这种重构对于几乎所有代码都是成功的，除了ExitHandler ，它仍然存在； from kfp.v2.dsl import ExitHandler 。 似乎以前使用kfp.compiler.Compiler().compile(pipeline, 'basic_pipeline.tar.gz')文件将管道 object 编译成tar.gz文件的方法保留了某种类型的 Argo 占位符，而新的.json管道使用compiler.Compiler().compile(pipeline_func=pipeline, package_path="basic-pipeline.json")的工作方式不同。 下面，我将 go 详细介绍 Pipelines SDK v1 中的工作原理以及我如何尝试在 v2 中实现它。

以前，使用 Kubeflow Pipelines v1，我可以使用 ExitHandler，如这个 StackOverflow 问题中所示。 当其中一个管道组件失败时，向 Slack 发送消息。 我会将管道定义为

import kfp.dsl as dsl

@dsl.pipeline(
    name='Basic-pipeline'
)
def pipeline(...):
    exit_task = dsl.ContainerOp(
        name='Exit handler that catches errors and post them in Slack',
        image='eu.gcr.io/.../send-error-msg-to-slack',
        arguments=[
                    'python3', 'main.py',
                    '--message', 'Basic-pipeline failed'
                    '--status', "{{workflow.status}}"
                  ]
    )
    with dsl.ExitHandler(exit_task):
        step_1 = dsl.ContainerOp(...)
        step_2 = dsl.ContainerOp(...) \
            .after(step_1)

if __name__ == '__main__':
    import kfp.compiler as compiler
    compiler.Compiler().compile(pipeline, 'basic_pipeline.tar.gz')

如果管道的任何步骤失败， exit_task会将message到我们的 Slack。 exit_task图像的代码看起来像

import argparse

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--message', type=str)
    parser.add_argument('--status', type=str)
    return parser.parse_known_args()

def main(FLAGS):
    def post_to_slack(msg):
        ...

    if FLAGS.status == "Failed":
        post_to_slack(FLAGS.message)
    else:
        pass

if __name__ == '__main__':
    FLAGS, unparsed = get_args()
    main(FLAGS)

这行得通，因为底层的 Argo 工作流可以以某种方式理解"{{workflow.status}}"的概念。

但是，我现在正尝试使用 Vertex AI 来运行管道，利用 Kubeflow Pipelines SDK v2, kfp.v2 。 使用与之前相同的退出处理程序图像'eu.gcr.io/.../send-error-msg-to-slack' ，我现在定义一个 yaml 组件文件 ( exit_handler.yaml )，

name: Exit handler
description: Prints to Slack if any step of the pipeline fails

inputs:
  - {name: message, type: String}
  - {name: status, type: String}

implementation:
  container:
    image: eu.gcr.io/.../send-error-msg-to-slack
    command: [
      python3,
      main.py,
      --message, {inputValue: message},
      --status, {inputValue: status}
    ]

管道代码现在看起来像这样，

from google.cloud import aiplatform
from google.cloud.aiplatform import pipeline_jobs
from kfp.v2 import compiler
from kfp.v2.dsl import pipeline, ExitHandler
from kfp.components import load_component_from_file

@pipeline(name="Basic-pipeline",
          pipeline_root='gs://.../basic-pipeline')
def pipeline():
    exit_handler_spec = load_component_from_file('./exit_handler.yaml')
    exit_handler = exit_handler_spec(
        message="Basic pipeline failed.",
        status="{{workflow.status}}"
    )
    with ExitHandler(exit_handler):
        step_0_spec = load_component_from_file('./comp_0.yaml')
        step0 = step_0_spec(...)

        step_1_spec = load_component_from_file('./comp_1.yaml')
        step1 = step_1_spec(...) \
            .after(step0)

if __name__ == '__main__':
    compiler.Compiler().compile(
        pipeline_func=pipeline,
        package_path="basic-pipeline.json"
    )
    from google.oauth2 import service_account
    credentials = service_account.Credentials.from_service_account_file("./my-key.json")
    aiplatform.init(project='bsg-personalization',
                    location='europe-west4',
                    credentials=credentials)

    job = pipeline_jobs.PipelineJob(
        display_name="basic-pipeline",
        template_path="basic-pipeline.json",
        parameter_values={...}
    )
    job.run()

这“有效”（没有例外）编译和运行，但 ExitHandler 代码将status解释为具有值 {{workflow.status}} 的字符串，这也由上面代码生成的编译管道 json 指示（ basic-pipeline.json ），您可以在下面看到（ "stringValue": "{{workflow.status}}" ）：

...
         "exit-handler": {
            "componentRef": {
              "name": "comp-exit-handler"
            },
            "dependentTasks": [
              "exit-handler-1"
            ],
            "inputs": {
              "parameters": {
                "message": {
                  "runtimeValue": {
                    "constantValue": {
                      "stringValue": "Basic pipeline failed."
                    }
                  }
                },
                "status": {
                  "runtimeValue": {
                    "constantValue": {
                      "stringValue": "{{workflow.status}}"
                    }
                  }
                }
              }
            },
            "taskInfo": {
              "name": "exit-handler"
            },
            "triggerPolicy": {
              "strategy": "ALL_UPSTREAM_TASKS_COMPLETED"
            }
          }
...

知道如何使用 v1 将我的旧ExitHandler代码重构为新的 SDK v2，以使退出处理程序了解我的管道状态是否失败吗？

Answer 1

这可能尚未完全记录，但在 V2 中我们引入了一个不同的变量PipelineTaskFinalStatus ，它可以自动填充以供您将其发送到您的 Slack 频道。

这是官方文档https://cloud.google.com/vertex-ai/docs/pipelines/email-notifications#sending_a_notification_from_a_pipeline中退出处理程序的示例

这里是对应的 email 通知组件https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/v1/vertex_notification_email/component.yaml

您可以使用以下参数编写自己的组件，该参数将在退出处理程序运行时自动填充。

inputs:
...
  - name: pipeline_task_final_status
    type: PipelineTaskFinalStatus

（请注意，此功能目前在 Kubeflow Pipelines 开源发行版中尚不可用，将在 KFP V2 中提供。它仅在 Vertex Pipelines 发行版中可用）

Answer 2

KFP SDK v2中"{{workflow.status}}"的替换就是上面IronPan提到的特殊类型注解PipelineTaskFinalStatus 。

它的用法记录在https://www.kubeflow.org/docs/components/pipelines/v2/author-a-pipeline/pipelines/#dslexithandler

如何将 ExitHandler 与 Kubeflow 管道一起使用 SDK v2

问题描述

2 个解决方案

解决方案1
0 2022-10-06 05:22:43

解决方案2
0 2022-10-06 06:43:23

如何将 ExitHandler 与 Kubeflow 管道一起使用 SDK v2

问题描述

2 个解决方案

解决方案1 0 2022-10-06 05:22:43

解决方案2 0 2022-10-06 06:43:23

解决方案1
0 2022-10-06 05:22:43

解决方案2
0 2022-10-06 06:43:23