简体   繁体   中英

ModuleNotFoundError in Airflow using BashOperator

I want to create a pipeline using Airflow.

  1. I have different folders: preprocessing_data (where I have mycode.py) and mycode.py will use some python scripts available in another folder named helpers . As mentioned here from mycode.py:
from helpers.tags import *
from helpers.useful_functions import formatting_date
  1. I used a bashoperator with the bash_command below to run mycode.py
 bash_command=f"cd /opt/airflow && python preprocessing_data/mycode.py " f"--query-name mycode" f"--execution-date {{{{ ds }}}}"
  1. In my Dockerfile, I copied my folders as below
FROM apache/airflow:2.2.4-python3.9

COPY --chown=airflow:root . .
COPY --chown=airflow:root helpers/ .
COPY --chown=airflow:root preprocessing_data/ .

When I run my pipeline, I got this error in my logs:

ModuleNotFoundError: No module named 'helpers'

I tried many things but without results. Any ideas please?

The python interpreter doesn't know anything about your helpers package, you can add it to the python path, in this case you be able to import from it.

Suppose that the helpers package is in the folder /opt/airflow :

 bash_command=f"export PYTHONPATH=$PYTHONPATH:/opt/airflow && cd /opt/airflow && python preprocessing_data/mycode.py " f"--query-name mycode" f"--execution-date {{{{ ds }}}}"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM