Coming from a Java background, I'm missing a global logging framework/configuration for Python Notebooks, like log4j.
In log4j I would configure a log4j configuration file, that sends logs directy to Azure Log Analytics.
How do I do this in Databricks for Python Notebooks? I would like to call something like: log.warn("please take care...")
and it should be sent to Azure Log Analytics, with some meta data (eg notebook name, start time of job, etc.)
I configure spark cluster to send logs to the Azure log analytics workspace
Step 1: Clone the repository
Step 2: Set Azure Databricks workspace
Step 3: Install Azure Databricks CLI and set up authentication.
Run following command
pip install databricks-cli
configure the CLI to use a personal access token, run the following command:
databricks configure --token
The command begins by issuing the prompt:
Databricks Host (should begin with https://):
Enter per-workspace URL, with the format https://adb-<workspace-id>.<random-number>.azuredatabricks.net
.
The command continues by issuing the prompt to enter personal access token:
Token:
To generate token Open menu by clicking on email id in the top right corner of menu bar -> Open User Settings -> Click on the Generate new token button -> click on generate.
Step 4: Install
Java Development Kit (JDK) version 1.8
Scala language SDK 2.12
Apache Maven 3.6.3
Now open the terminal where extracted the files are present and type these commands.
M2_HOME='apache-maven-3.8.6'
PATH="$M2_HOME/bin:$PATH"
export PATH
mvn -version
Step5: Build the Azure Databricks monitoring library**
Using docker
docker run -it --rm -v %cd%:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 mcr.microsoft.com/java/maven:8-zulu-debian10 /spark-monitoring/build.sh
Step 6: Configure the Databricks workspace
dbfs mkdirs dbfs:/databricks/spark-monitoring
Azure key vault is also used here for security reasons
LOG_ANALYTICS_WORKSPACE_ID=$LOG_ANALYTICS_WORKSPACE_ID
LOG_ANALYTICS_WORKSPACE_KEY=$LOG_ANALYTICS_WORKSPACE_KEY
export AZ_SUBSCRIPTION_ID=d2738032-56cs-45cc-sdfs-2a906b237yhsg #your azure databricks subscription id
export AZ_RSRC_GRP_NAME=PipelineML #your resource group name
export AZ_RSRC_PROV_NAMESPACE=Microsoft.Databricks
export AZ_RSRC_TYPE=workspaces
export AZ_RSRC_NAME=pipelinemachinelearning #your databricks workspace name
Open log analytics workspace secrets, copy the vault URI and paste it in the DNS name and resource ID to resource ID.
Step 7: copy src/spark-listeners/scripts/spark-monitoring.sh to the directory created in Step 6 and also copy all jars files to src/target
dbfs cp src/spark-listeners/scripts/spark-monitoring.sh dbfs:/databricks/spark-monitoring/spark-monitoring.sh
dbfs cp --overwrite --recursive src/target/ dbfs:/databricks/spark-monitoring/
Step 8: create cluster in databricks workspace and open advance options
Format: {{secrets/{your scope name}/{your secret name in key vault}}}
click on create cluster
Testing by sending logs to azure logs analytics
Run both cells
open Custom Logs if don't see customs logs restart cluster
Write a query and click on run, now logs are visible
SparkLoggingEvent_CL
| where Message contains "Testing for log"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.