I have airflow setup on my local machine.Dags are written in a way that they need to access database(postgres).I am trying to setup similar thing on Google Cloud Platform.But I am not able to connect database to Airflow in a composer.I am Keep getting error " no host postgres " Any Suggestions for setting up airflow on GCP or Connecting Database to airflow composer??
Here Is Link For My Complete Airflow Folder:(This setup works fine on my local machine with docker)
https://github.com/digvijay13873/airflow-docker.git
I am using GCP composer.Postgres Database is in SQL instance. My Table creation Dag is here: https://github.com/digvijay13873/airflow-docker/blob/main/dags/tablecreation.py
What changes should I do in a My existing Dag to connect it with postgres in SQL instance. I tried Giving public IP address of postgres in Host parameter.
Answering your main question, connecting a SQL instance from GCP in Cloud Composer environment can be done in two ways:
Connecting using Public IP: Postgres: connect directly via TCP (non-SSL)
os.environ['AIRFLOW_CONN_PUBLIC_POSTGRES_TCP'] = (
"gcpcloudsql://{user}:{password}@{public_ip}:{public_port}/{database}?"
"database_type=postgres&"
"project_id={project_id}&"
"location={location}&"
"instance={instance}&"
"use_proxy=False&"
"use_ssl=False".format(**postgres_kwargs)
)
For more information refer github
For connecting using Cloud SQL proxy: You can connect using Auth proxy from GKE as per this documentation.
After setting up the SQL proxy you can connect Composer to your SQL instance using a proxy.
Exemplar Code:
SQL = [
'CREATE TABLE IF NOT EXISTS TABLE_TEST (I INTEGER)',
'CREATE TABLE IF NOT EXISTS TABLE_TEST (I INTEGER)',
'INSERT INTO TABLE_TEST VALUES (0)',
'CREATE TABLE IF NOT EXISTS TABLE_TEST2 (I INTEGER)',
'DROP TABLE TABLE_TEST',
'DROP TABLE TABLE_TEST2',
]
HOME_DIR = expanduser("~")
def get_absolute_path(path):
if path.startswith("/"):
return path
else:
return os.path.join(HOME_DIR, path)
postgres_kwargs = dict(
user=quote_plus(GCSQL_POSTGRES_USER),
password=quote_plus(GCSQL_POSTGRES_PASSWORD),
public_port=GCSQL_POSTGRES_PUBLIC_PORT,
public_ip=quote_plus(GCSQL_POSTGRES_PUBLIC_IP),
project_id=quote_plus(GCP_PROJECT_ID),
location=quote_plus(GCP_REGION),
instance=quote_plus(GCSQL_POSTGRES_INSTANCE_NAME_QUERY),
database=quote_plus(GCSQL_POSTGRES_DATABASE_NAME),
)
os.environ['AIRFLOW_CONN_PROXY_POSTGRES_TCP'] = \
"gcpcloudsql://{user}:{password}@{public_ip}:{public_port}/{database}?" \
"database_type=postgres&" \
"project_id={project_id}&" \
"location={location}&" \
"instance={instance}&" \\
"use_proxy=True&" \
"sql_proxy_use_tcp=True".format(**postgres_kwargs)
connection_names = [
"proxy_postgres_tcp",
]
dag = DAG(
'con_SQL',
default_args=default_args,
description='A DAG that connect to the SQL server.',
schedule_interval=timedelta(days=1),
)
def print_client(ds, **kwargs):
client = storage.Client()
print(client)
print_task = PythonOperator(
task_id='print_the_client',
provide_context=True,
python_callable=print_client,
dag=dag,
)
for connection_name in connection_names:
task = CloudSqlQueryOperator(
gcp_cloudsql_conn_id=connection_name,
task_id="example_gcp_sql_task_" + connection_name,
sql=SQL,
dag=dag
)
print_task >> task
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.