简体   繁体   中英

How to Connect Database(postgres) to Airflow composer On Google Cloud Platform?

I have airflow setup on my local machine.Dags are written in a way that they need to access database(postgres).I am trying to setup similar thing on Google Cloud Platform.But I am not able to connect database to Airflow in a composer.I am Keep getting error " no host postgres " Any Suggestions for setting up airflow on GCP or Connecting Database to airflow composer??

Here Is Link For My Complete Airflow Folder:(This setup works fine on my local machine with docker)

https://github.com/digvijay13873/airflow-docker.git

I am using GCP composer.Postgres Database is in SQL instance. My Table creation Dag is here: https://github.com/digvijay13873/airflow-docker/blob/main/dags/tablecreation.py

What changes should I do in a My existing Dag to connect it with postgres in SQL instance. I tried Giving public IP address of postgres in Host parameter.

Answering your main question, connecting a SQL instance from GCP in Cloud Composer environment can be done in two ways:

  • Using Public IP
  • Using Cloud SQL proxy (recommended): secure access without the need of authorized.networks and SSL configuration

Connecting using Public IP: Postgres: connect directly via TCP (non-SSL)

os.environ['AIRFLOW_CONN_PUBLIC_POSTGRES_TCP'] = (
    "gcpcloudsql://{user}:{password}@{public_ip}:{public_port}/{database}?"
    "database_type=postgres&"
    "project_id={project_id}&"
    "location={location}&"
    "instance={instance}&"
    "use_proxy=False&"
    "use_ssl=False".format(**postgres_kwargs)
)

For more information refer github

For connecting using Cloud SQL proxy: You can connect using Auth proxy from GKE as per this documentation.

After setting up the SQL proxy you can connect Composer to your SQL instance using a proxy.

Exemplar Code:

SQL = [
    'CREATE TABLE IF NOT EXISTS TABLE_TEST (I INTEGER)',
    'CREATE TABLE IF NOT EXISTS TABLE_TEST (I INTEGER)',
    'INSERT INTO TABLE_TEST VALUES (0)',
    'CREATE TABLE IF NOT EXISTS TABLE_TEST2 (I INTEGER)',
    'DROP TABLE TABLE_TEST',
    'DROP TABLE TABLE_TEST2',
]

HOME_DIR = expanduser("~")
def get_absolute_path(path):
    if path.startswith("/"):
        return path
    else:
        return os.path.join(HOME_DIR, path)
postgres_kwargs = dict(
    user=quote_plus(GCSQL_POSTGRES_USER),
    password=quote_plus(GCSQL_POSTGRES_PASSWORD),
    public_port=GCSQL_POSTGRES_PUBLIC_PORT,
    public_ip=quote_plus(GCSQL_POSTGRES_PUBLIC_IP),
    project_id=quote_plus(GCP_PROJECT_ID),
    location=quote_plus(GCP_REGION),
    instance=quote_plus(GCSQL_POSTGRES_INSTANCE_NAME_QUERY),
    database=quote_plus(GCSQL_POSTGRES_DATABASE_NAME),
)

os.environ['AIRFLOW_CONN_PROXY_POSTGRES_TCP'] = \
    "gcpcloudsql://{user}:{password}@{public_ip}:{public_port}/{database}?" \
    "database_type=postgres&" \
    "project_id={project_id}&" \
    "location={location}&" \
    "instance={instance}&" \\
    "use_proxy=True&" \
    "sql_proxy_use_tcp=True".format(**postgres_kwargs)
connection_names = [
    "proxy_postgres_tcp",
]

dag = DAG(
    'con_SQL',
    default_args=default_args,
    description='A DAG that connect to the SQL server.',
    schedule_interval=timedelta(days=1),
)
def print_client(ds, **kwargs):
    client = storage.Client()
    print(client)
print_task = PythonOperator(
    task_id='print_the_client',
    provide_context=True,
    python_callable=print_client,
    dag=dag,
)
for connection_name in connection_names:
    task = CloudSqlQueryOperator(
         gcp_cloudsql_conn_id=connection_name,
         task_id="example_gcp_sql_task_" + connection_name,
         sql=SQL,
         dag=dag
    )
print_task >> task

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM