I have a DAG in airflow that uses the KubernetesPodOperator and I am trying to get some files that are generated by the container running in the pod back to the airflow host. For development my host is a Docker container running an airflow image with a docker-desktop K8s cluster and for production I am using an AWS EC2 box with EKS.
volume_mount = VolumeMount('dbt-home',
mount_path=<CONTAINER_DIR>,
sub_path=None,
read_only=False)
volume_config= {
'hostPath':
{'path': <HOST_DIR>, 'type': 'DirectoryOrCreate'}
}
volume = Volume(name="dbt-home", configs=volume_config)
dbt_run = KubernetesPodOperator(
namespace='default',
image=MY_IMAGE>,
cmds=["bash", "-cx"],
arguments=[command],
env_vars=MY_ENVIRONMENT,
volumes=[volume],
volume_mounts=[volume_mount],
name="test-run",
task_id="test-run-task",
config_file=config_file,
get_logs=True,
reattach_on_restart=True,
dag=dag
)
I tried using the hostPath type for the volume but i think that it refers to the host of the pod. I looked in the kubernetes documentation around volumes where I found the EmptyDir one which didnt work out either.
Based on your comment, you are asking how one task run in a pod can complete and write logs to a location that another task run in a pod can read when it starts. It seems like you could do a few things.
kubectl get logs
(ie- put kubectl into your task image and permission its service account to get the logs of pods in that namespace) or use the Kubernetes python API to get the logs.I am not sure if you are looking for a more airflow native way of doing this, but those are ideas that come to mind that would solve your problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.