简体   繁体   中英

Using airflow's postgres database for application data

I'm starting a new project which requires a data warehouse, for which we'll be using postgres. (Shameless plug: swarm64 makes postgres a great DW option for datasets up to terabytes large)

I'm using apache airflow to orchestrate the workloads, but as I'm new to airflow, I'm not sure what the best practice is for the application's DB needs.

For some more context, I'm using airflow's docker-compose.yml , and I'm also an airflow newbie.

Noticing that the docker-compose already defines a postgres db:

...
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5
    restart: always
...

I'm immediately wondering if it would be a better idea to add another postgres service or if to configure the existing one to have 2 users, 2 databases...

Eventually, I'll move this project to a cloud, and will probably use a AWS postgres RDS or similar.

My question then is:

  1. What is the best practice here?
  2. If there isn't one, what are the tradeoffs between the different approaches?

Airflow doesn't care what is your DWH you will be able to interact with it using Hooks and Operators. Many of them are available as Providers to Airflow and you can always write custom ones if needed. You need to separate between the Airflow backend metadata db (which can be PostgreSQL, MySQL) and you analytical storage where you store your processed data which can be anything you want (PostgreSQL, MySQL, S3, BigQuery, and many others).

Do NOT make Airflow backend database also your analytical database even if they both are PostgreSQL!

As for your questions the answer is:

Use plain regular PostgreSQL/MySQL for your Airflow installation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM