The kube.netes Communicating between containers tutorial defines the following pipeline yaml:
apiVersion: v1
kind: Pod
metadata:
name: two-containers
spec:
restartPolicy: Never
volumes: <--- This is what I need
- name: shared-data
emptyDir: {}
containers:
- name: nginx-container
image: nginx
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
- name: debian-container
image: debian
volumeMounts:
- name: shared-data
mountPath: /pod-data
command: ["/bin/sh"]
args: ["-c", "echo Hello from the debian container > /pod-data/index.html"]
Note that the volumes
key is defined under spec
, and thus the volume is available to all defined containers. I want to achieve the same behavior using kfp , which is the API for kubeflow pipelines.
However, I can only add volumes to individual containers, but not to the whole workflow spec using kfp.dsl.ContainerOp.container.add_volume_mount
that points to a previously created volume ( kfp.dsl.PipelineVolume ), because the volume seems to only be defined within a container.
Here is what I have tried, but the volume is always defined in the first container, not the "global" level. How do I get it so that op2
has access to the volume? I would have expected it to be inside kfp.dsl.PipelineConf , but volumes can not be added to it. Is it just not implemented?
import kubernetes as k8s
from kfp import compiler, dsl
from kubernetes.client import V1VolumeMount
import pprint
@dsl.pipeline(name="debug", description="Debug only pipeline")
def pipeline_func():
op = dsl.ContainerOp(
name='echo',
image='library/bash:4.4.23',
command=['sh', '-c'],
arguments=['echo "[1,2,3]"> /tmp/output1.txt'],
file_outputs={'output': '/tmp/output1.txt'})
op2 = dsl.ContainerOp(
name='echo2',
image='library/bash:4.4.23',
command=['sh', '-c'],
arguments=['echo "[4,5,6]">> /tmp/output1.txt'],
file_outputs={'output': '/tmp/output1.txt'})
mount_folder = "/tmp"
volume = dsl.PipelineVolume(volume=k8s.client.V1Volume(
name=f"test-storage",
empty_dir=k8s.client.V1EmptyDirVolumeSource()))
op.add_pvolumes({mount_folder: volume})
op2.container.add_volume_mount(volume_mount=V1VolumeMount(mount_path=mount_folder,
name=volume.name))
op2.after(op)
workflow = compiler.Compiler().create_workflow(pipeline_func=pipeline_func)
pprint.pprint(workflow["spec"])
You might want to check the difference between Kube.netes pods and containers. The Kube.netes example you've posted shows a two-container pod. You can recreate the same example in KFP by adding a sidecar container to an instantiated ContainerOp. What your second example is doing is creating two single-container pods that do not see each other by design.
To exchange data between pods you'd need some real volume, not emptyDir which only works for container is a single pod.
volume = dsl.PipelineVolume(volume=k8s.client.V1Volume( name=f"test-storage", empty_dir=k8s.client.V1EmptyDirVolumeSource())) op.add_pvolumes({mount_folder: volume})
Please do not use dsl.PipelineVolume or op.add_pvolume unless you know what it is and why you want it. Just use normal op.add_volume
and op.container.add_volume_mount
.
Nevertheless, is there a particular reason you need to use volumes? Volumes make pipelines and components non-portable. No 1st-party components use volumes.
KFP team encourages users to use normal data passing methods: non-python , python
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.