简体   繁体   中英

How to mount a volume to a Kubernetes Pod running a docker container from the Airflow host that triggers it using the KubernetesPodOperator

I have a DAG in airflow that uses the KubernetesPodOperator and I am trying to get some files that are generated by the container running in the pod back to the airflow host. For development my host is a Docker container running an airflow image with a docker-desktop K8s cluster and for production I am using an AWS EC2 box with EKS.

volume_mount = VolumeMount('dbt-home',
                           mount_path=<CONTAINER_DIR>,
                           sub_path=None,
                           read_only=False)

volume_config= {
    'hostPath':
      {'path': <HOST_DIR>, 'type': 'DirectoryOrCreate'}
    }

volume = Volume(name="dbt-home", configs=volume_config)


dbt_run = KubernetesPodOperator(
                          namespace='default',
                          image=MY_IMAGE>,
                          cmds=["bash", "-cx"],
                          arguments=[command],
                          env_vars=MY_ENVIRONMENT,
                          volumes=[volume],
                          volume_mounts=[volume_mount],
                          name="test-run",
                          task_id="test-run-task",
                          config_file=config_file,
                          get_logs=True,
                          reattach_on_restart=True,
                          dag=dag
                          )

I tried using the hostPath type for the volume but i think that it refers to the host of the pod. I looked in the kubernetes documentation around volumes where I found the EmptyDir one which didnt work out either.

Based on your comment, you are asking how one task run in a pod can complete and write logs to a location that another task run in a pod can read when it starts. It seems like you could do a few things.

  1. You could just have your task that starts get the logs of the previous pod that completed via either kubectl get logs (ie- put kubectl into your task image and permission its service account to get the logs of pods in that namespace) or use the Kubernetes python API to get the logs.
  2. You could mount a pvc into your initial Task at a certain location and write the logs there, and then when it completes, you can mount that same pvc into your next Task and it can read the logs from that location. You could use ebs if it will only be mounted into one pod at a time, or you could use nfs if it will be mounted into many pods at a time. Probably nfs would make sense so that you could share your logs across many Tasks in pods at once.
  3. You can ship your logs to Cloudwatch via fluentd . Your task could then query Cloudwatch for the previous task's logs. I think that shipping your logs to Cloudwatch is a good practice anyway, and so you may as well do that.

I am not sure if you are looking for a more airflow native way of doing this, but those are ideas that come to mind that would solve your problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM