[英]Airflow BashOperator - Use different role then its pod role
I've tried to run the following commands as part of a bash script runs in BashOperator:作为在 BashOperator 中运行的 bash 脚本的一部分,我尝试运行以下命令:
aws cli ls s3://bucket
aws cli cp ... ...
The script runs successfully, however the aws cli
commands return error, showing that aws cli
doesn't run with the needed permissions (as was defined in airflow-worker-node
role)脚本成功运行,但是
aws cli
命令返回错误,表明aws cli
没有以所需的权限运行(如airflow-worker-node
角色中定义的那样)
Investigating the error:调查错误:
I've upgraded awscli
in the docker running the pod - to version 2.4.9 (I've understood that old version of awscli
doesn't support access to s3 based on permission grant by aws role
我已经将运行 pod 的
awscli
中的 awscli 升级到 2.4.9 版(我知道旧版本的awscli
不支持基于aws role
授予的权限访问 s3
I've Investigated the pod running my bash_script using the BashOperator:我已经使用 BashOperator 调查了运行我的 bash_script 的 pod:
Using k9s, and D (describe) command:使用 k9s 和 D(描述)命令:
Using k9s, and s (shell) command:使用 k9s 和 s (shell) 命令:
aws cli
worked with the needed permissions and can access s3
as needed. aws cli
使用所需的权限,可以根据需要访问s3
。aws sts get-caller-identity
- reported the right role ( airflow-worker-node
) aws sts get-caller-identity
- 报告了正确的角色 ( airflow-worker-node
) Running the above commands as part of the bash-script which was executed in the BashOperator
gave me different results:将上述命令作为在
BashOperator
中执行的 bash 脚本的一部分运行给了我不同的结果:
env
showed limited amount of env variablesenv
显示有限数量的环境变量aws cli
returned permission related error. aws cli
返回与权限相关的错误。aws sts get-caller-identity
- reported the eks role ( eks-worker-node
) aws sts get-caller-identity
- 报告 eks 角色 ( eks-worker-node
) How can I grant aws cli
in my BashOperator bash-script the needed permissions?如何在我的 BashOperator bash-script 中授予
aws cli
所需的权限?
Reviewing the BashOperator source code, I've noticed the following code:查看 BashOperator 源代码,我注意到以下代码:
https://github.com/apache/airflow/blob/main/airflow/operators/bash.py https://github.com/apache/airflow/blob/main/airflow/operators/bash.py
def get_env(self, context):
"""Builds the set of environment variables to be exposed for the bash command"""
system_env = os.environ.copy()
env = self.env
if env is None:
env = system_env
else:
if self.append_env:
system_env.update(env)
env = system_env
And the following documentation:以及以下文档:
:param env: If env is not None, it must be a dict that defines the
environment variables for the new process; these are used instead
of inheriting the current process environment, which is the default
behavior. (templated)
:type env: dict
:param append_env: If False(default) uses the environment variables passed in env params
and does not inherit the current process environment. If True, inherits the environment variables
from current passes and then environment variable passed by the user will either update the existing
inherited environment variables or the new variables gets appended to it
:type append_env: bool
If bash operator input env variables is None, it copies the env variables of the father process.如果 bash 操作符输入环境变量为无,则复制父进程的环境变量。 In my case, I provided some env variables therefore it didn't copy the env variables of the father process into the chid process - which caused the child process (the BashOperator process) to use the default arn_role of eks-worker-node.
就我而言,我提供了一些环境变量,因此它没有将父进程的环境变量复制到 chid 进程中 - 这导致子进程(BashOperator 进程)使用 eks-worker-node 的默认 arn_role。
The simple solution is to set the following flag in BashOperator(): append_env=True
which will append all existing env
variables to the env variables I added manually.简单的解决方案是在 BashOperator() 中设置以下标志:
append_env=True
这会将env
所有现有的环境变量添加到我手动添加的环境变量中。
I've figured out that in the version I'm running (2.0.1) it isn't supported (it is supported in later versions).我发现在我正在运行的版本(2.0.1)中它不受支持(它在以后的版本中受支持)。 As a temp solution I've add
**os.environ
- to the BashOperator
env
parameter:作为临时解决方案,我将
**os.environ
- 添加到BashOperator
env
参数中:
return BashOperator(
task_id="copy_data_from_mcd_s3",
env={
"dag_input": "{{ dag_run.conf }}",
......
**os.environ,
},
# append_env=True,- should be supported in 2.2.0
bash_command="utils/my_script.sh",
dag=dag,
retries=1,
)
Which solve the problem.哪个解决了问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.