简体   繁体   中英

GCP Apache Airflow - How to install Python package from a private repository and import on DAG?

I have a private repository. This repository has my common functions about my DAG. (for example: datetime validaters, response encoder function) I want to import this repository's functions on my DAG file and I used this link to do it.

I created pip.conf file. this file's location is: my-bucket-name/config/pip/pip.conf and i added my private github repository in this file like this:

[global]
extra-index-url=https://<token>@github.com/my-private-github-repo.git

After this, i wanted to import this repository's functions on my dag file (for example: from common-repo import *) but i got 'module not found' error on my DAG. (and unfortunately in the cloud composer logs, I couldn't see any log showing that the private github repo has been installed.)

I've searched a lot but can't find how to do this.

You can add the private repo to the requirements in a PythonVirtualenvOperator like this:

from airflow import DAG
from airflow.decorators import task

@task.virtualenv(
   task_id="virtualenv_python",
   requirements=["https://<token>@github.com/my-private-github-repo.git"],
                 system_site_packages=False
)

def callable_from_virtualenv():
   import your_private_module

   ..etc...


virtualenv_task = callable_from_virtualenv()

(Example ripped from Airflow python operator example )

In order to avoid hardcoding token / credential in the source code, you can use an Airflow variable just like this:

from airflow.models import Variable

@task.virtualenv(
   task_id="virtualenv_python",
   requirements=[Variable.get("private_github_repo")],
                 system_site_packages=False
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM