GPU Workload with Composer 2 and GKE Autopilot?

Question

we have the latest Version of Composer 2:
composer-2.0.28-airflow-2.3.3

Our GKE Version is:
1.22.12-gke.2300

We want to deploy GPU Workloads within Composer 2.

We tried as documented here

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-tesla-t4
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 600; done;"]
    resources:
      limits:
        nvidia.com/gpu: 1

but it seems the examples don't work for us.

Error message is:
Autopilot doesn't support GPUs yet.

The documentation says:
"Ensure that you have a GKE Autopilot cluster running GKE version 1.24.2-gke.1800 or later."

Does this mean that you can't yet use GPU workloads with the current version of composer 2?

Or are we meant to go the way with GKECreateClusterOperator and setting up separate special GPU nodepool?

Thanks in advance for any help

Answer 1

In the "Before You Begin" section on the Autopilot GPU docs :

Ensure that you have a GKE Autopilot cluster running GKE version 1.24.2-gke.1800 or later.

For me that meant creating the cluster using the --release-channel=rapid flag. I ran into an issue trying to upgrade the cluster in place and just decided to chuck it, but there is probably a path to upgrade them in place

GPU Workload with Composer 2 and GKE Autopilot?

Question

1 answers

solution1
0 2022-12-06 22:58:57

GPU Workload with Composer 2 and GKE Autopilot?

Question

1 answers

solution1 0 2022-12-06 22:58:57

solution1
0 2022-12-06 22:58:57