I've deployed an Airflow instance on Kubernetes using the stable/airflow
helm chart. I slightly modified the puckel/docker-airflow
image to be able to install the Kubernetes executor. All tasks are now being executed successfully on our Kubernetes cluster, but the logs of these tasks are nowhere to be found.
I would like to upload the logs to our Azure Blob Storage account. I've configured my environment variables like this:
AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER="wasb-airflow"
AIRFLOW__CORE__REMOTE_LOG_CONN_ID="wasb_default"
AIRFLOW__CORE__REMOTE_LOGGING="True"
The wasb_default
connection includes a login and password for the Azure Blob Storage account. I've tested this connection using a WasbHook
and was able to delete a dummy file with success.
When I try to view the logs, this message is displayed:
*** Log file does not exist: /usr/local/airflow/logs/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Fetching from: http://examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081:8793/log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='examplepythonoperatorprintthecontext-4a6e6a1f11fd431f8c2a1dc081', port=8793): Max retries exceeded with url: /log/example_python_operator/print_the_context/2019-11-29T15:42:25+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f34ecdbe990>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Any ideas on how to solve this problem?
Found the solution. Increase the AIRFLOW__WEBSERVER__LOG_FETCH_TIMEOUT_SEC
environment variable to something like 15.
First, your ENV VARS need to follow this structure
AIRFLOW__VAR__YOUR_VARIABLE_NAME
for instance
AIRFLOW__VAR__CORE__REMOTE_BASE_LOG_FOLDER="wasb-airflow".
If you are working with the helm chart you can modify the values.yaml or use set tags to your helm upgrade command
Modify the values.yaml
helm show values apache-airflow/airflow > values.yaml
and then in the values.yaml modify the logs section it's at the end of the file, and it's something like this:
logs:
persistence:
enabled: true
# Volume size for logs
size: 100Gi
# If using a custom storageClass, pass the name here
storageClassName: YOUR_STORAGE_CLASS
## the name of an existing PVC to use
existingClaim: YOUR_EXISTING_CLAIM
and then apply the upgrade command, for instance
helm upgrade --install airflow apache-airflow/airflow --namespace $YOUR_NAMESPACE -f values.yaml
With commands
helm upgrade --install airflow apache-airflow/airflow --namespace $YOUR_NAMESPACE --set logs.persistence.enabled=true --set logs.persistence.size=10Gi --set logs.persistence.storageClassName=azurefile --set logs.persistence.existingClaim=YOUR_CLAIM_NAME
If you want to see the storageClass that you have available in azure run the following
kubectl get sc
and you will receive the list of your default Storage Classes
If you need to create a PVC follow this documentation page
https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.