简体   繁体   中英

How can I execute and schedule Databricks notebook from Azure Devops Pipeline using YAML

I wanted to do CICD of my azure Databricks notebook using YAML file. I have followed the below flow

  1. Pushed my code from Databricks notebook to Azure Repos.
  2. Created a Build using below YAML script.
stages:
- stage: Build
  displayName: Build stage

  jobs:
  - job: Build
    displayName: Build
    steps:
    - task: CopyFiles@2
      displayName: 'Copy Files to:  $(build.artifactstagingdirectory)'
      inputs:
        SourceFolder: '$(System.DefaultWorkingDirectory)'
        TargetFolder: ' $(build.artifactstagingdirectory)'

    - task: PublishBuildArtifacts@1
      displayName: 'Publish Artifact: notebooks'
      inputs:
        ArtifactName: dev_release
    - task: PublishBuildArtifacts@1
      inputs:
        PathtoPublish: '$(Build.ArtifactStagingDirectory)'
        ArtifactName: 'publish build'
        publishLocation: 'Container'

By doing above I was able to create a Artifact.

Now I have added another task to deploy that artifact to my Databricks workspace. By using below YAML Script.

- stage: Deploy
  displayName: Deploy stage

  jobs:
  - job: Deploy
    displayName: Deploy
    pool:
      vmImage: 'vs2017-win2016'
    steps:
    - task: DownloadBuildArtifacts@0
      inputs:
        buildType: 'current'
        downloadType: 'single'
        artifactName: 'dev_release'
        downloadPath: '$(System.ArtifactsDirectory)'
    - task: databricksDeployScripts@0
      inputs:
        authMethod: 'bearer'
        bearerToken: 'dapj0ee865674cd9tfb583dbad61b78ce9b1-4'
        region: 'Central US'
        localPath: '$(System.DefaultWorkingDirectory)'
        databricksPath: '/Shared'

Now i want to run the deployed notebook from here only. So I have "Configure Databricks CLI" task and "Execute Databricks" task to execute the note book.

Got below Error :
##[error]Error: Unable to locate executable file: 'databricks'. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also verify the file has a valid extension for an executable file. ##[error]The given notebook does not exist.

How can I execute notebook from Azure DevOps. My notebooks are in Scala Language.
Is there any other way to use in Production servers.

As you have deployed the Databricks Notebook using Azure DevOps and asking for any other way to run it, I would like to suggest you Azure Data Factory Service.

In Azure Data Factory, you can create pipeline that executes a Databricks notebook against the Databricks jobs cluster. You can also pass Azure Data Factory parameters to the Databricks notebook during execution.

Follow the official tutorial to Run Databricks Notebook with Databricks Notebook Activity in Azure Data Factory to deploy and run Databrick Notebook.

Additionally, you can schedule the pipeline trigger at any particular time or event to make the process completely automatic. Refer https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers

try this:

- job: job_name          
  displayName: test job
  pool:
    name: agent_name(selfhostedagent)
    #pool:
  workspace:
    clean: all    
  steps:
  - checkout: none
  - task: DownloadBuildArtifacts@0
    displayName: 'Download Build Artifacts'
    inputs:
      artifactName: app
      downloadPath: $(System.DefaultWorkingDirectory)
  - task: riserrad.azdo-databricks.azdo-databricks-configuredatabricks.configuredatabricks@0
    displayName: 'Configure Databricks CLI'
    inputs:
      url: '$(Databricks_URL)'
      token: '$(Databricks_PAT)'
  - task: riserrad.azdo-databricks.azdo-databricks-deploynotebooks.deploynotebooks@0
    displayName: 'Deploy Notebooks to Workspace'
    inputs:
      notebooksFolderPath: '$(System.DefaultWorkingDirectory)/app/path/to/notebbok'
      workspaceFolder: /Shared
  - task: riserrad.azdo-databricks.azdo-databricks-executenotebook.executenotebook@0
    displayName: 'Execute /Shared/path/to/notebook'
    inputs:
      notebookPath: '/Shared/path/to/notebook'
      existingClusterId: '$(cluster_id)'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM