I wanted to do CICD of my azure Databricks notebook using YAML file. I have followed the below flow
stages:
- stage: Build
displayName: Build stage
jobs:
- job: Build
displayName: Build
steps:
- task: CopyFiles@2
displayName: 'Copy Files to: $(build.artifactstagingdirectory)'
inputs:
SourceFolder: '$(System.DefaultWorkingDirectory)'
TargetFolder: ' $(build.artifactstagingdirectory)'
- task: PublishBuildArtifacts@1
displayName: 'Publish Artifact: notebooks'
inputs:
ArtifactName: dev_release
- task: PublishBuildArtifacts@1
inputs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)'
ArtifactName: 'publish build'
publishLocation: 'Container'
By doing above I was able to create a Artifact.
Now I have added another task to deploy that artifact to my Databricks workspace. By using below YAML Script.
- stage: Deploy
displayName: Deploy stage
jobs:
- job: Deploy
displayName: Deploy
pool:
vmImage: 'vs2017-win2016'
steps:
- task: DownloadBuildArtifacts@0
inputs:
buildType: 'current'
downloadType: 'single'
artifactName: 'dev_release'
downloadPath: '$(System.ArtifactsDirectory)'
- task: databricksDeployScripts@0
inputs:
authMethod: 'bearer'
bearerToken: 'dapj0ee865674cd9tfb583dbad61b78ce9b1-4'
region: 'Central US'
localPath: '$(System.DefaultWorkingDirectory)'
databricksPath: '/Shared'
Now i want to run the deployed notebook from here only. So I have "Configure Databricks CLI" task and "Execute Databricks" task to execute the note book.
Got below Error :
##[error]Error: Unable to locate executable file: 'databricks'. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also verify the file has a valid extension for an executable file. ##[error]The given notebook does not exist.
How can I execute notebook from Azure DevOps. My notebooks are in Scala Language.
Is there any other way to use in Production servers.
As you have deployed the Databricks Notebook using Azure DevOps and asking for any other way to run it, I would like to suggest you Azure Data Factory Service.
In Azure Data Factory, you can create pipeline that executes a Databricks notebook against the Databricks jobs cluster. You can also pass Azure Data Factory parameters to the Databricks notebook during execution.
Follow the official tutorial to Run Databricks Notebook with Databricks Notebook Activity in Azure Data Factory to deploy and run Databrick Notebook.
Additionally, you can schedule the pipeline trigger at any particular time or event to make the process completely automatic. Refer https://learn.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
try this:
- job: job_name
displayName: test job
pool:
name: agent_name(selfhostedagent)
#pool:
workspace:
clean: all
steps:
- checkout: none
- task: DownloadBuildArtifacts@0
displayName: 'Download Build Artifacts'
inputs:
artifactName: app
downloadPath: $(System.DefaultWorkingDirectory)
- task: riserrad.azdo-databricks.azdo-databricks-configuredatabricks.configuredatabricks@0
displayName: 'Configure Databricks CLI'
inputs:
url: '$(Databricks_URL)'
token: '$(Databricks_PAT)'
- task: riserrad.azdo-databricks.azdo-databricks-deploynotebooks.deploynotebooks@0
displayName: 'Deploy Notebooks to Workspace'
inputs:
notebooksFolderPath: '$(System.DefaultWorkingDirectory)/app/path/to/notebbok'
workspaceFolder: /Shared
- task: riserrad.azdo-databricks.azdo-databricks-executenotebook.executenotebook@0
displayName: 'Execute /Shared/path/to/notebook'
inputs:
notebookPath: '/Shared/path/to/notebook'
existingClusterId: '$(cluster_id)'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.