I've tried to run the following commands as part of a bash script runs in BashOperator:
aws cli ls s3://bucket
aws cli cp ... ...
The script runs successfully, however the aws cli
commands return error, showing that aws cli
doesn't run with the needed permissions (as was defined in airflow-worker-node
role)
Investigating the error:
I've upgraded awscli
in the docker running the pod - to version 2.4.9 (I've understood that old version of awscli
doesn't support access to s3 based on permission grant by aws role
I've Investigated the pod running my bash_script using the BashOperator:
Using k9s, and D (describe) command:
Using k9s, and s (shell) command:
aws cli
worked with the needed permissions and can access s3
as needed. aws sts get-caller-identity
- reported the right role ( airflow-worker-node
) Running the above commands as part of the bash-script which was executed in the BashOperator
gave me different results:
env
showed limited amount of env variablesaws cli
returned permission related error. aws sts get-caller-identity
- reported the eks role ( eks-worker-node
) How can I grant aws cli
in my BashOperator bash-script the needed permissions?
Reviewing the BashOperator source code, I've noticed the following code:
https://github.com/apache/airflow/blob/main/airflow/operators/bash.py
def get_env(self, context):
"""Builds the set of environment variables to be exposed for the bash command"""
system_env = os.environ.copy()
env = self.env
if env is None:
env = system_env
else:
if self.append_env:
system_env.update(env)
env = system_env
And the following documentation:
:param env: If env is not None, it must be a dict that defines the
environment variables for the new process; these are used instead
of inheriting the current process environment, which is the default
behavior. (templated)
:type env: dict
:param append_env: If False(default) uses the environment variables passed in env params
and does not inherit the current process environment. If True, inherits the environment variables
from current passes and then environment variable passed by the user will either update the existing
inherited environment variables or the new variables gets appended to it
:type append_env: bool
If bash operator input env variables is None, it copies the env variables of the father process. In my case, I provided some env variables therefore it didn't copy the env variables of the father process into the chid process - which caused the child process (the BashOperator process) to use the default arn_role of eks-worker-node.
The simple solution is to set the following flag in BashOperator(): append_env=True
which will append all existing env
variables to the env variables I added manually.
I've figured out that in the version I'm running (2.0.1) it isn't supported (it is supported in later versions). As a temp solution I've add **os.environ
- to the BashOperator
env
parameter:
return BashOperator(
task_id="copy_data_from_mcd_s3",
env={
"dag_input": "{{ dag_run.conf }}",
......
**os.environ,
},
# append_env=True,- should be supported in 2.2.0
bash_command="utils/my_script.sh",
dag=dag,
retries=1,
)
Which solve the problem.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.