Logging in Databricks Python Notebooks

Question

Coming from a Java background, I'm missing a global logging framework/configuration for Python Notebooks, like log4j.

In log4j I would configure a log4j configuration file, that sends logs directy to Azure Log Analytics.

How do I do this in Databricks for Python Notebooks? I would like to call something like: log.warn("please take care...")
and it should be sent to Azure Log Analytics, with some meta data (eg notebook name, start time of job, etc.)

Answer 1

Configure Databricks to send logs to Azure Log Analytics

I configure spark cluster to send logs to the Azure log analytics workspace

Steps to set up the library:

Step 1: Clone the repository

Step 2: Set Azure Databricks workspace

Step 3: Install Azure Databricks CLI and set up authentication.

Run following command

pip install databricks-cli

configure the CLI to use a personal access token, run the following command:

databricks configure --token

The command begins by issuing the prompt:

Databricks Host (should begin with https://):

Enter per-workspace URL, with the format https://adb-<workspace-id>.<random-number>.azuredatabricks.net .

The command continues by issuing the prompt to enter personal access token:

Token:

To generate token Open menu by clicking on email id in the top right corner of menu bar -> Open User Settings -> Click on the Generate new token button -> click on generate.

Step 4: Install

Java Development Kit (JDK) version 1.8
Scala language SDK 2.12
Apache Maven 3.6.3
Now open the terminal where extracted the files are present and type these commands.

    M2_HOME='apache-maven-3.8.6'
    
    PATH="$M2_HOME/bin:$PATH"
    
    export PATH
    
    mvn -version

Step5: Build the Azure Databricks monitoring library**

Using docker

Open terminal and type the following command [first change directory to /spark-monitoring]

docker run -it --rm -v %cd%:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 mcr.microsoft.com/java/maven:8-zulu-debian10 /spark-monitoring/build.sh

Step 6: Configure the Databricks workspace

Create directory

dbfs mkdirs dbfs:/databricks/spark-monitoring

Open /src/spark-listeners/scripts/spark-monitoring.sh file and add log analytics workspace key and id.

Azure key vault is also used here for security reasons

Now Enter your AZ_SUBSCRIPTION_ID, AZ_RSRC_GRP_NAME, AZ_RSRC_PROV_NAMESPACE, AZ_RSRC_TYPE, AZ_RSRC_NAME

LOG_ANALYTICS_WORKSPACE_ID=$LOG_ANALYTICS_WORKSPACE_ID

LOG_ANALYTICS_WORKSPACE_KEY=$LOG_ANALYTICS_WORKSPACE_KEY

export AZ_SUBSCRIPTION_ID=d2738032-56cs-45cc-sdfs-2a906b237yhsg #your azure databricks subscription id

export AZ_RSRC_GRP_NAME=PipelineML #your resource group name

export AZ_RSRC_PROV_NAMESPACE=Microsoft.Databricks

export AZ_RSRC_TYPE=workspaces

export AZ_RSRC_NAME=pipelinemachinelearning #your databricks workspace name

Set up log analytics workspace

Open Log Analytics workspace

Click on create

Write name of instance and resource group and click on review and create

define log analytics workspace secrets in the vault.

create scope in databricks notebook
Append secrets/createScope at the end of databricks url

Enter the scope name and get the azure key vault DNS name and key resource ID from the azure key vault

Open log analytics workspace secrets, copy the vault URI and paste it in the DNS name and resource ID to resource ID.

Click on Create

Step 7: copy src/spark-listeners/scripts/spark-monitoring.sh to the directory created in Step 6 and also copy all jars files to src/target

dbfs cp src/spark-listeners/scripts/spark-monitoring.sh dbfs:/databricks/spark-monitoring/spark-monitoring.sh

dbfs cp --overwrite --recursive src/target/ dbfs:/databricks/spark-monitoring/

Step 8: create cluster in databricks workspace and open advance options

First select your preferred cluster

open advance options and add two more env variables like this

Format: {{secrets/{your scope name}/{your secret name in key vault}}}

add this script to Init scripts

click on create cluster

Testing by sending logs to azure logs analytics

Create a new notebook in azure databricks

Run both cells

Visit Azure Log Analytics workspaces now visit to logs section

open Custom Logs if don't see customs logs restart cluster
Write a query and click on run, now logs are visible

SparkLoggingEvent_CL

| where Message contains "Testing for log"

Logging in Databricks Python Notebooks

Question

1 answers

solution1
1 ACCPTED 2022-11-16 05:33:54

Configure Databricks to send logs to Azure Log Analytics

Steps to set up the library:

Set up log analytics workspace

Logging in Databricks Python Notebooks

Question

1 answers

solution1 1 ACCPTED 2022-11-16 05:33:54

Configure Databricks to send logs to Azure Log Analytics

Steps to set up the library:

Set up log analytics workspace

solution1
1 ACCPTED 2022-11-16 05:33:54