Kubernetes pods N:M scheduling how-to

Question

Batch computations, Monte Carlo, using Docker image, multiple jobs running on Google cloud and managed by Kubernetes. No Replication Controllers, just multiple pods with NoRestart policy delivering computed payloads to our server. So far so good. Problem is, I have cluster with N nodes/minions, and have M jobs to compute, where M > N . So I would like to fire M pods at once and tell Kubernetes to schedule it in such a way so that only N are running at a given time, and everything else is kept in Pending state. As soon as one pod is done, next is scheduled to run moving from Pending to Running and so on and so forth till all M pods are done.

Is it possible to do so?

Answer 1

Yes, you can have them all ask for a resource of which there's only one on each node, then the scheduler won't be able to schedule more than N at a time. The most common way to do this is to have each pod ask for a hostPort in the ports section of its containers spec.

However, I can't say I'm completely sure why you would want to limit the system to one such pod per node. If there are enough resources available to run multiple at a time on each node, it should speed up your job to let them run.

Answer 2

Just for the record, after discussion with Alex, trial and error and a binary search for a good number, what worked for me was setting the CPU resource limit in the Pod JSON to:

    "resources": {
        "limits": {
            "cpu": "490m"
        }
    }

I have no idea how and why this particular value influences the Kubernetes scheduler, but it keeps nodes churning through the jobs, with exactly one pod per node running at any given moment.

Kubernetes pods N:M scheduling how-to

Question

2 answers

solution1
3 ACCPTED 2015-07-29 17:50:25

solution2
2 2015-07-30 01:14:45

Kubernetes pods N:M scheduling how-to

Question

2 answers

solution1 3 ACCPTED 2015-07-29 17:50:25

solution2 2 2015-07-30 01:14:45

solution1
3 ACCPTED 2015-07-29 17:50:25

solution2
2 2015-07-30 01:14:45