简体   繁体   中英

How To Scale Azure Kubernetes Cluster via Azure CLI

When I try to scale my Azure Kubernetes cluster per the documentation like:

az aks scale --resource-group my-resource-group --name my-cluster --node-count 5 --nodepool-name default

I get

cli.azure.cli.core.util : request failed: Error occurred in request., RetryError: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url: /subscriptions/[subscriptionguid]/resourceGroups/my-resource-group/providers/Microsoft.ContainerService/managedClusters/my-cluster?api-version=2020-03-01 (Caused by ResponseError('too many 500 error responses',))
request failed: Error occurred in request., RetryError: HTTPSConnectionPool(host='management.azure.com', port=443): Max retries exceeded with url: /subscriptions/[subscriptionguid]/resourceGroups/my-resource-group/providers/Microsoft.ContainerService/managedClusters/my-cluster?api-version=2020-03-01 (Caused by ResponseError('too many 500 error 
responses',))

I'm on 2.3.1 of Azure CLI in Windows. I've tried 2.2 in WSL too. I am able to scale through the UI just fine. Autoscaling is false. There is only one node pool (called default). This cluster was created through Terraform. Other az commands work fine. I've tried logging in as a user and as a service principal. I have no proxy. If I add --debug nothing of immediate value pops up.

If I watch the http requests in Fiddler, the response bodies of the 500 results look like this:

message=The credentials in ServicePrincipalProfile were invalid. Please see https://aka.ms/aks-sp-help for more details. (Details: adal: Refresh request failed. Status Code = '401'. Response body: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.\r\nTrace ID: 4d0fe224-1e60-4a91-91f1-399f697c0600\r\nCorrelation ID: 95b7e354-a63d-450e-8a7c-1851605a5b25\r\nTimestamp: 2020-04-07 13:51:07Z","error_codes":[7000215],"timestamp":"2020-04-07 13:51:07Z","trace_id":"4d0fe224-1e60-4a91-91f1-399f697c0600","correlation_id":"95b7e354-a63d-450e-8a7c-1851605a5b25","error_uri":"https://login.microsoftonline.com/error?code=7000215"})

If I do:

az aks show --resource-group my-resource-group --name my-cluster --query agentPoolProfiles

it results in:

[
  {
    "availabilityZones": null,
    "count": 3,
    "enableAutoScaling": false,
    "enableNodePublicIp": null,
    "maxCount": null,
    "maxPods": 110,
    "minCount": null,
    "mode": "User",
    "name": "default",
    "nodeLabels": null,
    "nodeTaints": null,
    "orchestratorVersion": "1.15.7",
    "osDiskSizeGb": 30,
    "osType": "Linux",
    "provisioningState": "Succeeded",
    "scaleSetEvictionPolicy": null,
    "scaleSetPriority": null,
    "spotMaxPrice": null,
    "tags": null,
    "type": "AvailabilitySet",
    "vmSize": "Standard_D2_v3"
  }
]

What am I doing wrong? How do I get AKS to scale through the CLI? Or failing those, how do I debug this?

I ended up solving this by upgrading to the latest terraform version and terraform azure provider (I took azurerm from 1.32.1 to 2.0 and terraform from 0.12.17 to 0.12.24). Then I deleted the cluster and had Terraform recreate it. Now it scales from the command line just fine. I suspect the relevant change it made is changing the type of node pool from "AvailabilitySet" to "VirtualMachineScaleSets".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM