简体   繁体   中英

Run a custom task asynchronously in airflow using existing celery

I have a running airflow with celery and redis. This by default sends dag's task to celery worker. I want to run a custom task from one of DAG's task from python code.

In tasks.py I have following code.

from airflow.configuration import conf
from airflow.config_templates.default_celery import DEFAULT_CELERY_CONFIG
from celery import Celery
from celery import shared_task



if conf.has_option('celery', 'celery_config_options'):
    celery_configuration = conf.getimport('celery', 'celery_config_options')
else:
    celery_configuration = DEFAULT_CELERY_CONFIG

app = Celery(conf.get('celery', 'CELERY_APP_NAME'), config_source=celery_configuration,include=["dags.tasks"])
app.autodiscover_tasks(force=True)
print("here")
print(conf.get('celery', 'CELERY_APP_NAME'))
print(celery_configuration)
print(app)
@app.task(name='maximum')
def maximum(x=10, y=11):
    #print("here")
    print(x)
    if x > y:
        return x
    else:
        return y

tasks = app.tasks.keys()
print(tasks)

I am calling this from one of the DAG's task.

    max=maximum.apply_async(kwargs={'x':5, 'y':4})
    print(max)
    print(max.get(timeout=5))

I am geting

  File "/home/airflow/.local/lib/python3.7/site-packages/celery/result.py", line 336, in maybe_throw
    self.throw(value, self._to_remote_traceback(tb))
  File "/home/airflow/.local/lib/python3.7/site-packages/celery/result.py", line 329, in throw
    self.on_ready.throw(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.7/site-packages/vine/promises.py", line 234, in throw
    reraise(type(exc), exc, tb)
  File "/home/airflow/.local/lib/python3.7/site-packages/vine/utils.py", line 30, in reraise
    raise value
celery.exceptions.NotRegistered: 'maximum'

In the registered tasks from above I am getting:

tasks = app.tasks.keys()
print(tasks)

output

dict_keys(['celery.chunks', 'airflow.executors.celery_executor.execute_command', 'maximum', 'celery.backend_cleanup', 'celery.chord_unlock', 'celery.group', 'celery.map', 'celery.accumulate', 'celery.chain', 'celery.starmap', 'celery.chord'])

Maximum is there in registered tasks.

The airflow worker is run from docker as follows(snip from docker-compose.yaml):

airflow-worker:
    <<: *airflow-common
    command: celery worker
    healthcheck:
      test:
        - "CMD-SHELL"
        - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

Full docker-compose.yaml

version: '3'
x-airflow-common:
  &airflow-common
  image: ${AIRFLOW_IMAGE_NAME:-tanesca-airflow:2.1.0}
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow  
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
#    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pandas kiteconnect}
#    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pandas}
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
#  user: "${AIRFLOW_UID:-50000}:0"
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

services:
  postgres:
    image: postgres:13
    environment:
      POSTGRES_USER: ****
      POSTGRES_PASSWORD: ***
      POSTGRES_DB: ***
    volumes:
      - postgres-db-volume:/var/lib/postgresql/data
    ports:
      - 5432:5432
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5
    restart: always

  redis:
    image: redis:latest
    ports:
      - 6379:6379
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

  airflow-webserver:
    <<: *airflow-common
    command: webserver
    ports:
      - 8080:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-scheduler:
    <<: *airflow-common
    command: scheduler
    healthcheck:
      test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-worker:
    <<: *airflow-common
    command: celery worker
    healthcheck:
      test:
        - "CMD-SHELL"
        - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  airflow-init:
    <<: *airflow-common
    command: version
    environment:
      <<: *airflow-common-env
      _AIRFLOW_DB_UPGRADE: 'true'
      _AIRFLOW_WWW_USER_CREATE: 'true'
      _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
      _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}

  flower:
    <<: *airflow-common
    command: celery flower
    ports:
      - 5555:5555
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

volumes:
  postgres-db-volume:

airflow worker logs

 -------------- celery@eecdca8a08ff v5.2.7 (dawn-chorus)
--- ***** ----- 
-- ******* ---- Linux-5.15.0-1019-aws-x86_64-with-debian-11.4 2022-09-02 12:35:42
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         airflow.executors.celery_executor:0x7fa27b38b0d0
- ** ---------- .> transport:   redis://redis:6379/0
- ** ---------- .> results:     postgresql://airflow:**@postgres/airflow
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> default          exchange=default(direct) key=default
                

[tasks]
  . airflow.executors.celery_executor.execute_command

[2022-09-02 12:35:50,295: INFO/MainProcess] Connected to redis://redis:6379/0
[2022-09-02 12:35:50,310: INFO/MainProcess] mingle: searching for neighbors

I assume that you simply want to run custom Python code within your task. Not sure why you are using Celery decorator, maybe I missed something.

Anyway, I would recommend using PythonOperator for that. You need to implement your own logic and it will run in celery worker.

Based on your code above, I've created a short example:

import logging
from datetime import datetime, timedelta

from airflow import DAG
from airflow.operators.python import PythonOperator

def maximum(**kwargs):
    logging.warning(f"got this args: {kwargs}")
    x = kwargs.get("x")
    y = kwargs.get("y")
    if x > y:
        return x
    else:
        return y


with DAG(
    'tutorial',
    default_args={
    },
    description='A simple tutorial DAG',
    schedule_interval=timedelta(days=1),
    start_date=datetime(2021, 1, 1),
    catchup=False,
    tags=['example'],
) as dag:
    op_kwargs = {
        "x": 10,
        "y": 11,
    }

    t1 = PythonOperator(
        task_id="my_python_task",
        python_callable=maximum,
        dag=dag,
        op_kwargs=op_kwargs
    )

    t1

You can see that it ran: 在此处输入图像描述

And return the result (if you have downstream task to consume that): 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM