for my project for data extraction I have gone for the apacahe Airflow, with GCP composer and bucket storage.
I have several modules in a package in my repo in Github, that my DAG file need to acess for now im using BashOperator to check if it works:
#dag.py
dag = DAG(
dag_id='my_example_DAG',
start_date=datetime(2019, 10, 17, 8, 25),
schedule_interval=timedelta(minutes=15),
default_args=default_args,
)
t1 = BashOperator(
task_id='example_task',
bash_command='python /home/airflow/gcs/data/my_example_maindir/main.py ',
dag=dag)
t1
#main.py
def run_main(path_name)
#Reads YML file
extractor_pool(yml_info)
def extractor_pool
#do work
if __name__ == "__main__":
test_path = Example/path/for/test.yml
run_main(test_path)
And it works, it starts main.py with the test_path. but want to use the function run_main to parse the correct path with the correct YML file for the task.
I have tried to sys.path.insert the dir inside my storage bucket where my modules is, But i get import error dir:
dir for my dags file (cloned from my git repo) = Buckets/europe-west1-eep-envxxxxxxx-bucket/dags
dir for my scripts/packages = Buckets/europe-west1-eep-envxxxxxxx-bucket/data
#dag.py
import sys
sys.path.insert(0, "/home/airflow/gcs/data/Example/")
from Example import main
dag = DAG(
dag_id='task_1_dag',
start_date=datetime(2019, 10, 13),
schedule_interval=timedelta(minutes=10),
default_args=default_args,
)
t1 = PythonOperator(
task_id='task_1',
provide_context=True,
python_callable=main.run_main,
op_args={'path_name': "project_output_0184_Storgaten_33"},
dag=dag
)
t1
This result in a ''module not found'' error, and does not work.
I have done som reading in GCP and found this:
Installing a Python dependency from private repository: https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies
That says i need to place it in the directory path /config/pip/ example: gs://us-central1-b1-6efannnn-bucket/config/pip/pip.conf
But in my GCP storage bucket i have no directory named config. I have tried to trace my steps in when i created the bucket and env but can figure out what i have done wrong
GCS has no true notion of folders or directories, what you actually have is a series of blobs that have names which may contain slashes and give the appearance of a directory.
The instructions are a bit unclear by asking you to put it in a directory, but what you actually want to do is create a file and give it the prefix config/pip/pip.conf
.
With gsutil
you'd do something like:
gsutil cp my-local-pip.conf gs://[DESTINATION_BUCKET_NAME]/config/pip/pip.conf
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.