简体   繁体   中英

Logging in Databricks Python Notebooks

Coming from a Java background, I'm missing a global logging framework/configuration for Python Notebooks, like log4j.

In log4j I would configure a log4j configuration file, that sends logs directy to Azure Log Analytics.

How do I do this in Databricks for Python Notebooks? I would like to call something like: log.warn("please take care...")
and it should be sent to Azure Log Analytics, with some meta data (eg notebook name, start time of job, etc.)

Configure Databricks to send logs to Azure Log Analytics

I configure spark cluster to send logs to the Azure log analytics workspace

Steps to set up the library:

Step 1: Clone the repository

在此处输入图像描述

Step 2: Set Azure Databricks workspace

Step 3: Install Azure Databricks CLI and set up authentication.

Run following command

pip install databricks-cli  

configure the CLI to use a personal access token, run the following command:

databricks configure --token

The command begins by issuing the prompt:

Databricks Host (should begin with https://):

Enter per-workspace URL, with the format https://adb-<workspace-id>.<random-number>.azuredatabricks.net .

The command continues by issuing the prompt to enter personal access token:

Token:

To generate token Open menu by clicking on email id in the top right corner of menu bar -> Open User Settings -> Click on the Generate new token button -> click on generate.

在此处输入图像描述

在此处输入图像描述

在此处输入图像描述

Step 4: Install

  • Java Development Kit (JDK) version 1.8

  • Scala language SDK 2.12

  • Apache Maven 3.6.3

  • Now open the terminal where extracted the files are present and type these commands.

    M2_HOME='apache-maven-3.8.6'
    
    PATH="$M2_HOME/bin:$PATH"
    
    export PATH
    
    mvn -version

Step5: Build the Azure Databricks monitoring library**

Using docker

  • Open terminal and type the following command [first change directory to /spark-monitoring]
docker run -it --rm -v %cd%:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 mcr.microsoft.com/java/maven:8-zulu-debian10 /spark-monitoring/build.sh

在此处输入图像描述

Step 6: Configure the Databricks workspace

  • Create directory
dbfs mkdirs dbfs:/databricks/spark-monitoring
  • Open /src/spark-listeners/scripts/spark-monitoring.sh file and add log analytics workspace key and id.

在此处输入图像描述

在此处输入图像描述

Azure key vault is also used here for security reasons

  • Now Enter your AZ_SUBSCRIPTION_ID, AZ_RSRC_GRP_NAME, AZ_RSRC_PROV_NAMESPACE, AZ_RSRC_TYPE, AZ_RSRC_NAME
LOG_ANALYTICS_WORKSPACE_ID=$LOG_ANALYTICS_WORKSPACE_ID

LOG_ANALYTICS_WORKSPACE_KEY=$LOG_ANALYTICS_WORKSPACE_KEY

export AZ_SUBSCRIPTION_ID=d2738032-56cs-45cc-sdfs-2a906b237yhsg #your azure databricks subscription id

export AZ_RSRC_GRP_NAME=PipelineML #your resource group name

export AZ_RSRC_PROV_NAMESPACE=Microsoft.Databricks

export AZ_RSRC_TYPE=workspaces

export AZ_RSRC_NAME=pipelinemachinelearning #your databricks workspace name

Set up log analytics workspace

  • Open Log Analytics workspace

在此处输入图像描述

  • Click on create

在此处输入图像描述

  • Write name of instance and resource group and click on review and create

在此处输入图像描述

  • define log analytics workspace secrets in the vault.

在此处输入图像描述

  • create scope in databricks notebook
  • Append secrets/createScope at the end of databricks url

  • Enter the scope name and get the azure key vault DNS name and key resource ID from the azure key vault

Open log analytics workspace secrets, copy the vault URI and paste it in the DNS name and resource ID to resource ID.

在此处输入图像描述

  • Click on Create

在此处输入图像描述

Step 7: copy src/spark-listeners/scripts/spark-monitoring.sh to the directory created in Step 6 and also copy all jars files to src/target

dbfs cp src/spark-listeners/scripts/spark-monitoring.sh dbfs:/databricks/spark-monitoring/spark-monitoring.sh

dbfs cp --overwrite --recursive src/target/ dbfs:/databricks/spark-monitoring/

Step 8: create cluster in databricks workspace and open advance options

  • First select your preferred cluster

在此处输入图像描述

  • open advance options and add two more env variables like this

Format: {{secrets/{your scope name}/{your secret name in key vault}}}

在此处输入图像描述

  • add this script to Init scripts

在此处输入图像描述

click on create cluster

Testing by sending logs to azure logs analytics

  • Create a new notebook in azure databricks

在此处输入图像描述

Run both cells

  • Visit Azure Log Analytics workspaces now visit to logs section

在此处输入图像描述

  • open Custom Logs if don't see customs logs restart cluster

  • Write a query and click on run, now logs are visible

SparkLoggingEvent_CL

| where Message contains "Testing for log"

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM