How To Scale Azure Kubernetes Cluster via Azure CLI

Question

When I try to scale my Azure Kubernetes cluster per the documentation like:

az aks scale --resource-group my-resource-group --name my-cluster --node-count 5 --nodepool-name default

I get

cli.azure.cli.core.util : request failed: Error occurred in request., RetryError: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url: /subscriptions/[subscriptionguid]/resourceGroups/my-resource-group/providers/Microsoft.ContainerService/managedClusters/my-cluster?api-version=2020-03-01 (Caused by ResponseError('too many 500 error responses',))
request failed: Error occurred in request., RetryError: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url: /subscriptions/[subscriptionguid]/resourceGroups/my-resource-group/providers/Microsoft.ContainerService/managedClusters/my-cluster?api-version=2020-03-01 (Caused by ResponseError('too many 500 error 
responses',))

I'm on 2.3.1 of Azure CLI in Windows. I've tried 2.2 in WSL too. I am able to scale through the UI just fine. Autoscaling is false. There is only one node pool (called default). This cluster was created through Terraform. Other az commands work fine. I've tried logging in as a user and as a service principal. I have no proxy. If I add --debug nothing of immediate value pops up.

If I watch the http requests in Fiddler, the response bodies of the 500 results look like this:

message=The credentials in ServicePrincipalProfile were invalid. Please see https://aka.ms/aks-sp-help for more details. (Details: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.\r\nTrace ID: 4d0fe224-1e60-4a91-91f1-399f697c0600\r\nCorrelation ID: 95b7e354-a63d-450e-8a7c-1851605a5b25\r\nTimestamp: 2020-04-07 13:51:07Z","error_codes":[7000215],"timestamp":"2020-04-07 13:51:07Z","trace_id":"4d0fe224-1e60-4a91-91f1-399f697c0600","correlation_id":"95b7e354-a63d-450e-8a7c-1851605a5b25","error_uri":"https://login.microsoftonline.com/error?code=7000215"})

If I do:

az aks show --resource-group my-resource-group --name my-cluster --query agentPoolProfiles

it results in:

[
  {
    "availabilityZones": null,
    "count": 3,
    "enableAutoScaling": false,
    "enableNodePublicIp": null,
    "maxCount": null,
    "maxPods": 110,
    "minCount": null,
    "mode": "User",
    "name": "default",
    "nodeLabels": null,
    "nodeTaints": null,
    "orchestratorVersion": "1.15.7",
    "osDiskSizeGb": 30,
    "osType": "Linux",
    "provisioningState": "Succeeded",
    "scaleSetEvictionPolicy": null,
    "scaleSetPriority": null,
    "spotMaxPrice": null,
    "tags": null,
    "type": "AvailabilitySet",
    "vmSize": "Standard_D2_v3"
  }
]

What am I doing wrong? How do I get AKS to scale through the CLI? Or failing those, how do I debug this?

Answer 1

I ended up solving this by upgrading to the latest terraform version and terraform azure provider (I took azurerm from 1.32.1 to 2.0 and terraform from 0.12.17 to 0.12.24). Then I deleted the cluster and had Terraform recreate it. Now it scales from the command line just fine. I suspect the relevant change it made is changing the type of node pool from "AvailabilitySet" to "VirtualMachineScaleSets".

How To Scale Azure Kubernetes Cluster via Azure CLI

Question

1 answers

solution1
0 ACCPTED 2020-04-17 14:28:27

How To Scale Azure Kubernetes Cluster via Azure CLI

Question

1 answers

solution1 0 ACCPTED 2020-04-17 14:28:27

solution1
0 ACCPTED 2020-04-17 14:28:27