I have a running airflow with celery and redis. This by default sends dag's task to celery worker. I want to run a custom task from one of DAG's task from python code.
In tasks.py I have following code.
from airflow.configuration import conf
from airflow.config_templates.default_celery import DEFAULT_CELERY_CONFIG
from celery import Celery
from celery import shared_task
if conf.has_option('celery', 'celery_config_options'):
celery_configuration = conf.getimport('celery', 'celery_config_options')
else:
celery_configuration = DEFAULT_CELERY_CONFIG
app = Celery(conf.get('celery', 'CELERY_APP_NAME'), config_source=celery_configuration,include=["dags.tasks"])
app.autodiscover_tasks(force=True)
print("here")
print(conf.get('celery', 'CELERY_APP_NAME'))
print(celery_configuration)
print(app)
@app.task(name='maximum')
def maximum(x=10, y=11):
#print("here")
print(x)
if x > y:
return x
else:
return y
tasks = app.tasks.keys()
print(tasks)
I am calling this from one of the DAG's task.
max=maximum.apply_async(kwargs={'x':5, 'y':4})
print(max)
print(max.get(timeout=5))
I am geting
File "/home/airflow/.local/lib/python3.7/site-packages/celery/result.py", line 336, in maybe_throw
self.throw(value, self._to_remote_traceback(tb))
File "/home/airflow/.local/lib/python3.7/site-packages/celery/result.py", line 329, in throw
self.on_ready.throw(*args, **kwargs)
File "/home/airflow/.local/lib/python3.7/site-packages/vine/promises.py", line 234, in throw
reraise(type(exc), exc, tb)
File "/home/airflow/.local/lib/python3.7/site-packages/vine/utils.py", line 30, in reraise
raise value
celery.exceptions.NotRegistered: 'maximum'
In the registered tasks from above I am getting:
tasks = app.tasks.keys()
print(tasks)
output
dict_keys(['celery.chunks', 'airflow.executors.celery_executor.execute_command', 'maximum', 'celery.backend_cleanup', 'celery.chord_unlock', 'celery.group', 'celery.map', 'celery.accumulate', 'celery.chain', 'celery.starmap', 'celery.chord'])
Maximum is there in registered tasks.
The airflow worker is run from docker as follows(snip from docker-compose.yaml):
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
Full docker-compose.yaml
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-tanesca-airflow:2.1.0}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
# _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pandas kiteconnect}
# _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pandas}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
# user: "${AIRFLOW_UID:-50000}:0"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:13
environment:
POSTGRES_USER: ****
POSTGRES_PASSWORD: ***
POSTGRES_DB: ***
volumes:
- postgres-db-volume:/var/lib/postgresql/data
ports:
- 5432:5432
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
volumes:
postgres-db-volume:
airflow worker logs
-------------- celery@eecdca8a08ff v5.2.7 (dawn-chorus)
--- ***** -----
-- ******* ---- Linux-5.15.0-1019-aws-x86_64-with-debian-11.4 2022-09-02 12:35:42
- *** --- * ---
- ** ---------- [config]
- ** ---------- .> app: airflow.executors.celery_executor:0x7fa27b38b0d0
- ** ---------- .> transport: redis://redis:6379/0
- ** ---------- .> results: postgresql://airflow:**@postgres/airflow
- *** --- * --- .> concurrency: 16 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** -----
-------------- [queues]
.> default exchange=default(direct) key=default
[tasks]
. airflow.executors.celery_executor.execute_command
[2022-09-02 12:35:50,295: INFO/MainProcess] Connected to redis://redis:6379/0
[2022-09-02 12:35:50,310: INFO/MainProcess] mingle: searching for neighbors
I assume that you simply want to run custom Python code within your task. Not sure why you are using Celery decorator, maybe I missed something.
Anyway, I would recommend using PythonOperator
for that. You need to implement your own logic and it will run in celery worker.
Based on your code above, I've created a short example:
import logging
from datetime import datetime, timedelta
from airflow import DAG
from airflow.operators.python import PythonOperator
def maximum(**kwargs):
logging.warning(f"got this args: {kwargs}")
x = kwargs.get("x")
y = kwargs.get("y")
if x > y:
return x
else:
return y
with DAG(
'tutorial',
default_args={
},
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
start_date=datetime(2021, 1, 1),
catchup=False,
tags=['example'],
) as dag:
op_kwargs = {
"x": 10,
"y": 11,
}
t1 = PythonOperator(
task_id="my_python_task",
python_callable=maximum,
dag=dag,
op_kwargs=op_kwargs
)
t1
And return the result (if you have downstream task to consume that):
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.