简体   繁体   中英

GCP apache airflow, how to install Python dependency from private repository

for my project for data extraction I have gone for the apacahe Airflow, with GCP composer and bucket storage.

I have several modules in a package in my repo in Github, that my DAG file need to acess for now im using BashOperator to check if it works:

#dag.py

dag = DAG(
    dag_id='my_example_DAG',
    start_date=datetime(2019, 10, 17, 8, 25),
    schedule_interval=timedelta(minutes=15),
    default_args=default_args,
)

t1 = BashOperator(
    task_id='example_task',
    bash_command='python /home/airflow/gcs/data/my_example_maindir/main.py ',
    dag=dag)
t1
#main.py

def run_main(path_name)
   #Reads YML file
   extractor_pool(yml_info)

def extractor_pool
    #do work

if __name__ == "__main__":
   test_path = Example/path/for/test.yml
   run_main(test_path)


And it works, it starts main.py with the test_path. but want to use the function run_main to parse the correct path with the correct YML file for the task.

I have tried to sys.path.insert the dir inside my storage bucket where my modules is, But i get import error dir:

dir for my dags file (cloned from my git repo) = Buckets/europe-west1-eep-envxxxxxxx-bucket/dags

dir for my scripts/packages = Buckets/europe-west1-eep-envxxxxxxx-bucket/data

#dag.py

import sys
sys.path.insert(0, "/home/airflow/gcs/data/Example/")
from Example import main

dag = DAG(
    dag_id='task_1_dag',
    start_date=datetime(2019, 10, 13),
    schedule_interval=timedelta(minutes=10),
    default_args=default_args,
)

t1 = PythonOperator(
   task_id='task_1',
   provide_context=True,
   python_callable=main.run_main,
   op_args={'path_name': "project_output_0184_Storgaten_33"},
   dag=dag
    )

t1



This result in a ''module not found'' error, and does not work.

I have done som reading in GCP and found this:

Installing a Python dependency from private repository: https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies

That says i need to place it in the directory path /config/pip/ example: gs://us-central1-b1-6efannnn-bucket/config/pip/pip.conf

But in my GCP storage bucket i have no directory named config. I have tried to trace my steps in when i created the bucket and env but can figure out what i have done wrong

GCS has no true notion of folders or directories, what you actually have is a series of blobs that have names which may contain slashes and give the appearance of a directory.

The instructions are a bit unclear by asking you to put it in a directory, but what you actually want to do is create a file and give it the prefix config/pip/pip.conf .

With gsutil you'd do something like:

gsutil cp my-local-pip.conf gs://[DESTINATION_BUCKET_NAME]/config/pip/pip.conf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM