I have a ADF pipeline with a Databricks activity.
The activity creates a new job cluster every time and I have added all the required Spark configurations to a corresponding linked service.
Now with Databricks offering Spot Instances, I'd like to create my new clusters with Spot configurations within Databricks.
I tried to find the help from the LinkedService docs but no luck!
How can I do this using ADF?
Cheers!!!
I have found another workaround to enable the ADF Databricks Linked Service to create job clusters with spot instances. As Alex Ott mentioned , the azure_attribute cluster property isn't supported by the Databricks Linked Service interface.
Instead, I ended up creating a cluster policy that enforces spot instances:
{
"azure_attributes.availability": {
"type": "fixed",
"value": "SPOT_WITH_FALLBACK_AZURE",
"hidden": true
}
}
You can add to that policy if you want to augment the other properties of the azure_attributes object. Also, make sure you set the policy permissions for the appropriate groups/users.
After creating the policy you will need to retrieve the policy id. I used a REST call to the 2.0/policies/clusters/list endpoint to get that value.
From there you can do what Alex Ott suggested and create the linked service using the dynamic json option and add the policyId property with the appropriate policy id to the typeProperties object:
"typeProperties": {
"domain": "Your Domain",
"newClusterNodeType": "@linkedService().ClusterNodeType",
"newClusterNumOfWorker": "@linkedService().NumWorkers",
"newClusterVersion": "7.3.x-scala2.12",
"newClusterInitScripts": [],
"newClusterDriverNodeType": "@linkedService().DriverNodeType",
"policyId": "Your policy id",
}
Now when you invoke your ADF pipeline it will create a job cluster using the cluster policy to restrict the availability property of azure_attributes to whatever you specified.
I'm not sure that it's possible right now as it requires specification of the azure_attributes
parameters when creating the cluster. But there should be a workaround - create an instance pool of the spot instances and specify that pool via instancePoolId property .
Update : it really works, the only drawback is that you need to use JSON to configure Linked Service (but it's possible to configure everything visually, save, and grab JSON from Git repository and update it with required parameters). So basic steps are following:
{
"name": "DBName",
"type": "Microsoft.DataFactory/factories/linkedservices",
"properties": {
"annotations": [],
"type": "AzureDatabricks",
"typeProperties": {
"domain": "https://some-url.azuredatabricks.net",
"newClusterNodeType": "Standard_DS3_v2",
"newClusterNumOfWorker": "5",
"instancePoolId":"<your-pool-id>",
"newClusterSparkEnvVars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"newClusterVersion": "8.2.x-scala2.12",
"newClusterInitScripts": [],
"encryptedCredential": "some-base-64"
}
}
}
Configure an ADF pipeline with job to execute - just as usual
Trigger ADF pipeline and after several minutes see that instance pool is used:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.