简体   繁体   中英

How do I define pipeline-level volumes in kubeflow pipelines to share across components?

The kube.netes Communicating between containers tutorial defines the following pipeline yaml:

apiVersion: v1
kind: Pod
metadata:
  name: two-containers
spec:

  restartPolicy: Never

  volumes:                      <--- This is what I need
  - name: shared-data
    emptyDir: {}

  containers:

  - name: nginx-container
    image: nginx
    volumeMounts:
    - name: shared-data
      mountPath: /usr/share/nginx/html

  - name: debian-container
    image: debian
    volumeMounts:
    - name: shared-data
      mountPath: /pod-data
    command: ["/bin/sh"]
    args: ["-c", "echo Hello from the debian container > /pod-data/index.html"]

Note that the volumes key is defined under spec , and thus the volume is available to all defined containers. I want to achieve the same behavior using kfp , which is the API for kubeflow pipelines.

However, I can only add volumes to individual containers, but not to the whole workflow spec using kfp.dsl.ContainerOp.container.add_volume_mount that points to a previously created volume ( kfp.dsl.PipelineVolume ), because the volume seems to only be defined within a container.

Here is what I have tried, but the volume is always defined in the first container, not the "global" level. How do I get it so that op2 has access to the volume? I would have expected it to be inside kfp.dsl.PipelineConf , but volumes can not be added to it. Is it just not implemented?

import kubernetes as k8s
from kfp import compiler, dsl
from kubernetes.client import V1VolumeMount
import pprint

@dsl.pipeline(name="debug", description="Debug only pipeline")
def pipeline_func():
    op = dsl.ContainerOp(
            name='echo',
            image='library/bash:4.4.23',
            command=['sh', '-c'],
            arguments=['echo "[1,2,3]"> /tmp/output1.txt'],
            file_outputs={'output': '/tmp/output1.txt'})
    op2 = dsl.ContainerOp(
            name='echo2',
            image='library/bash:4.4.23',
            command=['sh', '-c'],
            arguments=['echo "[4,5,6]">> /tmp/output1.txt'],
            file_outputs={'output': '/tmp/output1.txt'})

    mount_folder = "/tmp"
    volume = dsl.PipelineVolume(volume=k8s.client.V1Volume(
            name=f"test-storage",
            empty_dir=k8s.client.V1EmptyDirVolumeSource()))
    op.add_pvolumes({mount_folder: volume})
    op2.container.add_volume_mount(volume_mount=V1VolumeMount(mount_path=mount_folder,
                                                              name=volume.name))
    op2.after(op)


workflow = compiler.Compiler().create_workflow(pipeline_func=pipeline_func)
pprint.pprint(workflow["spec"])

You might want to check the difference between Kube.netes pods and containers. The Kube.netes example you've posted shows a two-container pod. You can recreate the same example in KFP by adding a sidecar container to an instantiated ContainerOp. What your second example is doing is creating two single-container pods that do not see each other by design.

To exchange data between pods you'd need some real volume, not emptyDir which only works for container is a single pod.

 volume = dsl.PipelineVolume(volume=k8s.client.V1Volume( name=f"test-storage", empty_dir=k8s.client.V1EmptyDirVolumeSource())) op.add_pvolumes({mount_folder: volume})

Please do not use dsl.PipelineVolume or op.add_pvolume unless you know what it is and why you want it. Just use normal op.add_volume and op.container.add_volume_mount .

Nevertheless, is there a particular reason you need to use volumes? Volumes make pipelines and components non-portable. No 1st-party components use volumes.

KFP team encourages users to use normal data passing methods: non-python , python

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM