简体   繁体   English

Airflow BashOperator - 使用与其 pod 角色不同的角色

[英]Airflow BashOperator - Use different role then its pod role

I've tried to run the following commands as part of a bash script runs in BashOperator:作为在 BashOperator 中运行的 bash 脚本的一部分,我尝试运行以下命令:

aws cli ls s3://bucket
aws cli cp ... ...

The script runs successfully, however the aws cli commands return error, showing that aws cli doesn't run with the needed permissions (as was defined in airflow-worker-node role)脚本成功运行,但是aws cli命令返回错误,表明aws cli没有以所需的权限运行(如airflow-worker-node角色中定义的那样)

Investigating the error:调查错误:

  1. I've upgraded awscli in the docker running the pod - to version 2.4.9 (I've understood that old version of awscli doesn't support access to s3 based on permission grant by aws role我已经将运行 pod 的awscli中的 awscli 升级到 2.4.9 版(我知道旧版本的awscli不支持基于aws role授予的权限访问 s3

  2. I've Investigated the pod running my bash_script using the BashOperator:我已经使用 BashOperator 调查了运行我的 bash_script 的 pod:

  • Using k9s, and D (describe) command:使用 k9s 和 D(描述)命令:

    • I saw that ARN_ROLE is defined correctly我看到 ARN_ROLE 定义正确
  • Using k9s, and s (shell) command:使用 k9s 和 s (shell) 命令:

    • I saw that pod environment variables are correct.我看到 pod 环境变量是正确的。
    • aws cli worked with the needed permissions and can access s3 as needed. aws cli使用所需的权限,可以根据需要访问s3
    • aws sts get-caller-identity - reported the right role ( airflow-worker-node ) aws sts get-caller-identity - 报告了正确的角色 ( airflow-worker-node )
  1. Running the above commands as part of the bash-script which was executed in the BashOperator gave me different results:将上述命令作为在BashOperator中执行的 bash 脚本的一部分运行给了我不同的结果:

    • Running env showed limited amount of env variables运行env显示有限数量的环境变量
    • aws cli returned permission related error. aws cli返回与权限相关的错误。
    • aws sts get-caller-identity - reported the eks role ( eks-worker-node ) aws sts get-caller-identity - 报告 eks 角色 ( eks-worker-node )

How can I grant aws cli in my BashOperator bash-script the needed permissions?如何在我的 BashOperator bash-script 中授予aws cli所需的权限?

Reviewing the BashOperator source code, I've noticed the following code:查看 BashOperator 源代码,我注意到以下代码:

https://github.com/apache/airflow/blob/main/airflow/operators/bash.py https://github.com/apache/airflow/blob/main/airflow/operators/bash.py

def get_env(self, context):
    """Builds the set of environment variables to be exposed for the bash command"""
    system_env = os.environ.copy()
    env = self.env
    if env is None:
        env = system_env
    else:
        if self.append_env:
            system_env.update(env)
            env = system_env

And the following documentation:以及以下文档:

:param env: If env is not None, it must be a dict that defines the
    environment variables for the new process; these are used instead
    of inheriting the current process environment, which is the default
    behavior. (templated)
:type env: dict
:param append_env: If False(default) uses the environment variables passed in env params
    and does not inherit the current process environment. If True, inherits the environment variables
    from current passes and then environment variable passed by the user will either update the existing
    inherited environment variables or the new variables gets appended to it
:type append_env: bool

If bash operator input env variables is None, it copies the env variables of the father process.如果 bash 操作符输入环境变量为无,则复制父进程的环境变量。 In my case, I provided some env variables therefore it didn't copy the env variables of the father process into the chid process - which caused the child process (the BashOperator process) to use the default arn_role of eks-worker-node.就我而言,我提供了一些环境变量,因此它没有将父进程的环境变量复制到 chid 进程中 - 这导致子进程(BashOperator 进程)使用 eks-worker-node 的默认 arn_role。

The simple solution is to set the following flag in BashOperator(): append_env=True which will append all existing env variables to the env variables I added manually.简单的解决方案是在 BashOperator() 中设置以下标志: append_env=True这会将env所有现有的环境变量添加到我手动添加的环境变量中。

I've figured out that in the version I'm running (2.0.1) it isn't supported (it is supported in later versions).我发现在我正在运行的版本(2.0.1)中它不受支持(它在以后的版本中受支持)。 As a temp solution I've add **os.environ - to the BashOperator env parameter:作为临时解决方案,我将**os.environ - 添加到BashOperator env参数中:

return BashOperator(
    task_id="copy_data_from_mcd_s3",
    env={
        "dag_input": "{{ dag_run.conf }}",
        ......
        **os.environ,
    },
    # append_env=True,- should be supported in 2.2.0
    bash_command="utils/my_script.sh",
    dag=dag,
    retries=1,
)

Which solve the problem.哪个解决了问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM