I created a custom package with a CLI (built with Click). This package can do two things: run preprocessing and run the machine learning model. I created a Docker image of this customer package and pushed it to a private registry on AWS (ECR).
Now I want to run this container with Airflow, which I want to run on an EC2 instance. I am running it with docker-compose.
For this example I will only focus on one task: run the container for the preprocessing.
However, now I get 'upstream failed' for t2.
from datetime import timedelta
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.docker_operator import DockerOperator
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': airflow.utils.dates.days_ago(1),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG(
'example_pipeline',
default_args=default_args,
description='example data pipeline.',
schedule_interval=timedelta(minutes=3)
)
t1 = BashOperator(
task_id='starting_airflow',
bash_command='echo "Starting Airflow DAG..."',
dag=dag,
)
t2 = DockerOperator(
task_id='data_pipeline',
image='XXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/rwg:latest',
container_name='task__export_data',
command="run-preprocessing",
network_mode="bridge",
api_version="auto",
docker_url="unix://var/run/docker.sock",
docker_conn_id='aws_con',
dag=dag
)
t1 >> t2
I created via the UI the 'aws_con'. But it does not seem to work.
Furthermore, this is my docker-compose.yml file.
version: '3'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
volumes:
- ./pgdata:/var/lib/postgresql/data
webserver:
build: .
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
volumes:
- ./dags:/usr/local/airflow/dags
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
What am I doing wrong regarding the Docker Operator?
And second question: how can I create this "aws_con" via code or the cli?
You should specify docker connection.
This is the standard way of passing credentials in Airflow and Docker has a dedicated Docker connection for that which you can define in Airlfow DB or Secrets and pass the id of the connection to DockerOperator as docker_conn_id
- parameter (you can also specify the url there so you should not need to pass docker_url in your operator).
See Python API here:
And the separate page about the Docker connection here (the docker_conn_id
description links to this page):
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.