简体   繁体   中英

How to use a private image from AWS ECR with Airflow's DockerOperator?

I created a custom package with a CLI (built with Click). This package can do two things: run preprocessing and run the machine learning model. I created a Docker image of this customer package and pushed it to a private registry on AWS (ECR).

Now I want to run this container with Airflow, which I want to run on an EC2 instance. I am running it with docker-compose.

For this example I will only focus on one task: run the container for the preprocessing.

However, now I get 'upstream failed' for t2.

from datetime import timedelta
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.docker_operator import DockerOperator


default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': airflow.utils.dates.days_ago(1),
    'email': ['airflow@example.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5)
}

dag = DAG(
    'example_pipeline',
    default_args=default_args,
    description='example data pipeline.',
    schedule_interval=timedelta(minutes=3)
)


t1 = BashOperator(
    task_id='starting_airflow',
    bash_command='echo "Starting Airflow DAG..."',
    dag=dag,
)


t2 = DockerOperator(
    task_id='data_pipeline',
    image='XXXXXXXX.dkr.ecr.eu-central-1.amazonaws.com/rwg:latest',
    container_name='task__export_data',
    command="run-preprocessing",
    network_mode="bridge",
    api_version="auto",
    docker_url="unix://var/run/docker.sock",
    docker_conn_id='aws_con',
    dag=dag
)

t1 >> t2

I created via the UI the 'aws_con'. But it does not seem to work.

Furthermore, this is my docker-compose.yml file.

version: '3'
services:
  postgres:
    image: postgres:9.6
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    ports:
      - "5432:5432"
    volumes:
      - ./pgdata:/var/lib/postgresql/data

  webserver:
    build: .
    restart: always
    depends_on:
      - postgres
    environment:
      - LOAD_EX=n
      - EXECUTOR=Local
      - FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
    volumes:
      - ./dags:/usr/local/airflow/dags
    ports:
      - "8080:8080"
    command: webserver
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3

What am I doing wrong regarding the Docker Operator?

And second question: how can I create this "aws_con" via code or the cli?

You should specify docker connection.

This is the standard way of passing credentials in Airflow and Docker has a dedicated Docker connection for that which you can define in Airlfow DB or Secrets and pass the id of the connection to DockerOperator as docker_conn_id - parameter (you can also specify the url there so you should not need to pass docker_url in your operator).

See Python API here:

https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/_api/airflow/providers/docker/operators/docker/index.html

And the separate page about the Docker connection here (the docker_conn_id description links to this page):

https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/connections/docker.html#howto-connection-docker

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM