简体   繁体   中英

Airflow Kubernetes Executor logs

I've deployed an Airflow instance on Kubernetes using the stable/airflow helm chart. I slightly modified the puckel/docker-airflow image to be able to install the Kubernetes executor. All tasks are now being executed successfully on our Kubernetes cluster, but the logs of these tasks are nowhere to be found.

I would like to upload the logs to our Azure Blob Storage account. I've configured my environment variables like this:

AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER="wasb-airflow"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID="wasb_default"
AIRFLOW__CORE__REMOTE_LOGGING="True"

The wasb_default connection includes a login and password for the Azure Blob Storage account. I've tested this connection using a WasbHook and was able to delete a dummy file with success.

When I try to view the logs, this message is displayed:

*** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Fetching from: http://examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081:8793/log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081', port=8793): Max retries exceeded with url: /log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f34ecdbe990>: Failed to establish a new connection: [Errno -2] Name or service not known'))

Any ideas on how to solve this problem?

Found the solution. Increase the AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC environment variable to something like 15.

Sorry for the late reply, I recently faced this issue and was able to solve it with this answer .

I have a working repo in my repo here , you can check it out if you want. This setup uses PV to store logs you can add the connection in airflow.yaml to send logs to the remote folder.

First, your ENV VARS need to follow this structure

AIRFLOW__VAR__YOUR_VARIABLE_NAME

for instance

AIRFLOW__VAR__CORE__REMOTE_BASE_LOG_FOLDER="wasb-airflow".

If you are working with the helm chart you can modify the values.yaml or use set tags to your helm upgrade command

Modify the values.yaml

helm show values apache-airflow/airflow > values.yaml

and then in the values.yaml modify the logs section it's at the end of the file, and it's something like this:

logs:
  persistence:
    enabled: true
    # Volume size for logs
    size: 100Gi
    # If using a custom storageClass, pass the name here
    storageClassName: YOUR_STORAGE_CLASS
    ## the name of an existing PVC to use
    existingClaim: YOUR_EXISTING_CLAIM

and then apply the upgrade command, for instance

helm upgrade --install airflow apache-airflow/airflow --namespace $YOUR_NAMESPACE -f values.yaml

With commands

helm upgrade --install airflow apache-airflow/airflow --namespace $YOUR_NAMESPACE --set logs.persistence.enabled=true --set logs.persistence.size=10Gi --set logs.persistence.storageClassName=azurefile --set logs.persistence.existingClaim=YOUR_CLAIM_NAME

If you want to see the storageClass that you have available in azure run the following

kubectl get sc

and you will receive the list of your default Storage Classes

If you need to create a PVC follow this documentation page

https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM