How to create a Spot instance - job cluster using Azure Data Factory(ADF) - Linked service

Question

I have a ADF pipeline with a Databricks activity.

The activity creates a new job cluster every time and I have added all the required Spark configurations to a corresponding linked service.

Now with Databricks offering Spot Instances, I'd like to create my new clusters with Spot configurations within Databricks.

I tried to find the help from the LinkedService docs but no luck!

How can I do this using ADF?

Cheers!!!

Answer 1

I have found another workaround to enable the ADF Databricks Linked Service to create job clusters with spot instances. As Alex Ott mentioned , the azure_attribute cluster property isn't supported by the Databricks Linked Service interface.

Instead, I ended up creating a cluster policy that enforces spot instances:

{
  "azure_attributes.availability": {
    "type": "fixed",
    "value": "SPOT_WITH_FALLBACK_AZURE",
    "hidden": true
  }
}

You can add to that policy if you want to augment the other properties of the azure_attributes object. Also, make sure you set the policy permissions for the appropriate groups/users.

After creating the policy you will need to retrieve the policy id. I used a REST call to the 2.0/policies/clusters/list endpoint to get that value.

From there you can do what Alex Ott suggested and create the linked service using the dynamic json option and add the policyId property with the appropriate policy id to the typeProperties object:

"typeProperties": {
  "domain": "Your Domain",
  "newClusterNodeType": "@linkedService().ClusterNodeType",
  "newClusterNumOfWorker": "@linkedService().NumWorkers",
  "newClusterVersion": "7.3.x-scala2.12",
  "newClusterInitScripts": [],
  "newClusterDriverNodeType": "@linkedService().DriverNodeType",
  "policyId": "Your policy id",
}

Now when you invoke your ADF pipeline it will create a job cluster using the cluster policy to restrict the availability property of azure_attributes to whatever you specified.

Answer 2

I'm not sure that it's possible right now as it requires specification of the azure_attributes parameters when creating the cluster. But there should be a workaround - create an instance pool of the spot instances and specify that pool via instancePoolId property .

Update : it really works, the only drawback is that you need to use JSON to configure Linked Service (but it's possible to configure everything visually, save, and grab JSON from Git repository and update it with required parameters). So basic steps are following:

Configure instance pool to use spot instances:

Configure Databricks linked service to use the instance pool:

{
    "name": "DBName",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
    "annotations": [],
    "type": "AzureDatabricks",
    "typeProperties": {
        "domain": "https://some-url.azuredatabricks.net",
        "newClusterNodeType": "Standard_DS3_v2",
        "newClusterNumOfWorker": "5",
        "instancePoolId":"<your-pool-id>",
        "newClusterSparkEnvVars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
        },
        "newClusterVersion": "8.2.x-scala2.12",
        "newClusterInitScripts": [],
        "encryptedCredential": "some-base-64"
    }
    }
}

Configure an ADF pipeline with job to execute - just as usual
Trigger ADF pipeline and after several minutes see that instance pool is used:

Answer 3

Please use ADF linked service option shown below to create a Spot Instance

How to create a Spot instance - job cluster using Azure Data Factory(ADF) - Linked service

Question

3 answers

solution1
3 ACCPTED 2021-07-08 23:31:15

solution2
1 2021-05-06 12:32:25

solution3
0 2021-05-06 09:02:14

How to create a Spot instance - job cluster using Azure Data Factory(ADF) - Linked service

Question

3 answers

solution1 3 ACCPTED 2021-07-08 23:31:15

solution2 1 2021-05-06 12:32:25

solution3 0 2021-05-06 09:02:14

solution1
3 ACCPTED 2021-07-08 23:31:15

solution2
1 2021-05-06 12:32:25

solution3
0 2021-05-06 09:02:14