简体   繁体   English

如何从使用 KubernetesPodOperator 触发它的 Airflow 主机将卷挂载到运行 docker 容器的 Kubernetes Pod

[英]How to mount a volume to a Kubernetes Pod running a docker container from the Airflow host that triggers it using the KubernetesPodOperator

I have a DAG in airflow that uses the KubernetesPodOperator and I am trying to get some files that are generated by the container running in the pod back to the airflow host.我在气流中有一个 DAG,它使用KubernetesPodOperator ,我正在尝试将由在 pod 中运行的容器生成的一些文件返回到气流主机。 For development my host is a Docker container running an airflow image with a docker-desktop K8s cluster and for production I am using an AWS EC2 box with EKS.对于开发,我的主机是一个 Docker 容器,它运行带有 docker-desktop K8s 集群的气流映像,对于生产,我使用的是带有 EKS 的 AWS EC2 盒。

volume_mount = VolumeMount('dbt-home',
                           mount_path=<CONTAINER_DIR>,
                           sub_path=None,
                           read_only=False)

volume_config= {
    'hostPath':
      {'path': <HOST_DIR>, 'type': 'DirectoryOrCreate'}
    }

volume = Volume(name="dbt-home", configs=volume_config)


dbt_run = KubernetesPodOperator(
                          namespace='default',
                          image=MY_IMAGE>,
                          cmds=["bash", "-cx"],
                          arguments=[command],
                          env_vars=MY_ENVIRONMENT,
                          volumes=[volume],
                          volume_mounts=[volume_mount],
                          name="test-run",
                          task_id="test-run-task",
                          config_file=config_file,
                          get_logs=True,
                          reattach_on_restart=True,
                          dag=dag
                          )

I tried using the hostPath type for the volume but i think that it refers to the host of the pod.我尝试对卷使用 hostPath 类型,但我认为它指的是 pod 的主机。 I looked in the kubernetes documentation around volumes where I found the EmptyDir one which didnt work out either.我查看了有关卷的 kubernetes 文档,我发现 EmptyDir 也没有解决。

Based on your comment, you are asking how one task run in a pod can complete and write logs to a location that another task run in a pod can read when it starts.根据您的评论,您是在询问在 pod 中运行的一个任务如何完成并将日志写入一个位置,以便在 pod 中运行的另一个任务在启动时可以读取。 It seems like you could do a few things.似乎你可以做一些事情。

  1. You could just have your task that starts get the logs of the previous pod that completed via either kubectl get logs (ie- put kubectl into your task image and permission its service account to get the logs of pods in that namespace) or use the Kubernetes python API to get the logs.您可以让您的任务开始获取通过kubectl get logs完成的前一个 pod 的kubectl get logs (即将 kubectl 放入您的任务映像并允许其服务帐户获取该命名空间中的 pod 日志)或使用 Kubernetes python API 来获取日志。
  2. You could mount a pvc into your initial Task at a certain location and write the logs there, and then when it completes, you can mount that same pvc into your next Task and it can read the logs from that location.您可以将 pvc 安装到某个位置的初始任务中并将日志写入那里,然后当它完成时,您可以将相同的 pvc 安装到下一个任务中,它可以从该位置读取日志。 You could use ebs if it will only be mounted into one pod at a time, or you could use nfs if it will be mounted into many pods at a time.如果一次只能挂载到一个 pod 中,则可以使用 ebs,如果一次挂载到多个 pod 中,则可以使用 nfs。 Probably nfs would make sense so that you could share your logs across many Tasks in pods at once.可能 nfs 会有意义,这样您就可以一次在 pod 中的许多任务之间共享您的日志。
  3. You can ship your logs to Cloudwatch via fluentd .您可以通过 fluentd将日志发送到Cloudwatch Your task could then query Cloudwatch for the previous task's logs.然后,您的任务可以查询 Cloudwatch 以获取上一个任务的日志。 I think that shipping your logs to Cloudwatch is a good practice anyway, and so you may as well do that.我认为无论如何将您的日志传送到 Cloudwatch 是一种很好的做法,因此您也可以这样做。

I am not sure if you are looking for a more airflow native way of doing this, but those are ideas that come to mind that would solve your problem.我不确定您是否正在寻找一种更气流原生的方式来做到这一点,但这些想法可以解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Kube.netesPodOperator 错误从 Airflow 在 GKE / Kube.netes 上部署 DBT pod - Deploying DBT pod on GKE / Kubernetes from Airflow using KubernetesPodOperator Error 如何将气流工人的体积安装到气流kubernetes吊舱操作员上? - How to mount volume of airflow worker to airflow kubernetes pod operator? 如何将文件夹/文件挂载到 KubernetesPodOperator Airflow - How to mount folder/file to KubernetesPodOperator Airflow 从使用 Airflow KubernetesPodOperator 启动的 Pod 获取上下文 - Get context from Pod launched with Airflow KubernetesPodOperator Airflow Kube.netesPodOperator:如何访问传递给 Pod 的秘密? - Airflow KubernetesPodOperator : how to access secret passed to Pod? python docker如何将目录从主机挂载到容器 - python docker how to mount the directory from host to container Docker + Airflow - 无法从 Z05B6053C41A21340AFD6FC3B234Z 容器连接到主机上的 MySQL - Docker + Airflow - Unable to connect to MySQL on host from docker container Airflow - Kubernetes Executor:如何只挂载与 run_id 相对应的 Persistent Volume Claim 目录 - Airflow - Kubernetes Executor : How to only mount only a directory of a Persistent Volume Claim that correspond to the run_id 使用 vscode 从主机调试 docker 容器中运行的 python 代码 - debug python code running in docker container from host using vscode 如何在 Docker 中为 windows 安装卷? - How to mount a volume in Docker for windows?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM