简体   繁体   中英

Attach Databricks Pool to a Databricks Job cluster in Azure

Is there a way we can attach a Databricks pool to a Databricks job cluster? The reason why I'm asking this question is, I've tested a Databricks job cluster configured as a new linked service and whenever an ADF pipeline triggers the job, I see a new Job cluster gets spined-up for each activity within the pipeline and every time when a new job cluster is spined-up it takes additional 2-3 minutes to spin-up the cluster, install the required libraries and to download the DBR version.

I've almost 30 ADF Pipelines to trigger on daily basis and each pipeline has an average of 3 activities within the pipeline, so in-total 30X3X(2.5)= 225 mints(3.75 hours). If we take on an average 2.5 mints to spin-up the cluster, then I would be wasting 3.75 hours to just spin-up the job clusters. Can we avoid cluster spin-up time.

In the high concurrency cluster this is not an issue at all, only the initial(very first) pipeline would take time post that subsequent pipelines will run faster by using the existing running nodes from the high concurrency cluster.

Any pointers would help!

Yes, you can attach job cluster to a pool - you just need to specify that pool via instancePoolId property as following:

  • Configure Databricks linked service to use the instance pool:
{
    "name": "DBName",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
    "annotations": [],
    "type": "AzureDatabricks",
    "typeProperties": {
        "domain": "https://some-url.azuredatabricks.net",
        "newClusterNodeType": "Standard_DS3_v2",
        "newClusterNumOfWorker": "5",
        "instancePoolId":"<your-pool-id>",
        "newClusterSparkEnvVars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
        },
        "newClusterVersion": "8.2.x-scala2.12",
        "newClusterInitScripts": [],
        "encryptedCredential": "some-base-64"
    }
    }
}
  • Configure an ADF pipeline with job to execute - just as usual

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM