強制 SGE 使用多台服務器

Question

TL;DR：有沒有辦法讓 SGE 在調度作業時在服務器之間循環，而不是盡可能將所有作業分配到同一台服務器？

細節：

我有一個包含許多較小作業的大型計算過程。 我正在使用 SGE 在集群中的多個服務器之間分配工作。

該過程需要在不同時間點執行不同數量的任務（從技術上講，它是作業的 DAG）。 有時並行作業的數量非常大（集群中每個 CPU 約 1 個），有時則小得多（每個服務器約 1 個）。 DAG 是動態的且不統一，因此很難判斷在任何給定點有/將有多少並行作業。

這些作業使用大量 CPU，但也會執行一些非平凡的 IO（特別是在作業啟動和關閉時）。 他們訪問連接到所有計算服務器的共享 NFS 服務器。 每個計算服務器都有一個較窄的連接 (10Gb/s)，但 NFS 服務器有幾個連接到通信交換機的寬連接 (40Gbs)。 不知道交換機主干的帶寬是多少，但它是一個怪物，所以它應該很高。

為了獲得最佳性能，作業應該在不同的服務器上安排在可能的情況。 也就是說，如果我有 20 個服務器，每個服務器有 20 個處理器，提交 20 個作業應該在每個服務器上運行一個作業。 提交 40 個作業應該在每個作業上運行 2 個，依此類推。提交 400 個作業會使整個集群飽和。

然而，SGE 有意將我的 I/O 性能降到最低。 提交 20 個作業會將所有作業都安排在一台服務器上。 因此，當其他 19 台具有 190Gb 帶寬的機器閑置時，它們都在為一個微不足道的 10Gb 網絡連接而戰。

我可以通過多種方式強制SGE 在不同的服務器上執行每個作業（使用資源、使用特殊隊列、使用我的並行環境並指定“-t 1-”等）。 但是，這意味着我只能在每台服務器上運行一個作業period 。 當 DAG 打開並產生許多作業時，作業將停止等待完全空閑的服務器，而每台機器的 20 個處理器中的 19 個將保持空閑。

我需要的是一種方法來告訴 SGE 以循環順序將每個作業分配給下一個具有可用插槽的服務器。 更好的方法是將作業分配給負載最少的服務器（未使用插槽的最大數量，或未使用插槽的最大比例，或已使用插槽的最小數量等）。 但是一個死的簡單循環可以解決問題。

這似乎是在一般一個更明智的策略，相比於同一服務器以前的工作，這只是我的情況最糟糕的戰略上運行的每個作業的SGE的政策。

我查看了 SGE 的配置選項，但找不到任何修改調度策略的方法。 也就是說，SGE 的文檔並不容易瀏覽，所以我很容易錯過一些東西。

有沒有人知道有什么方法可以讓 SGE 將其調度策略更改為循環或最小負載或任何類似的方法？

謝謝！

Answer 1

只需將 SGE 並行環境（ sge_pe文件）的allocation_rule更改為$round_robin ：

allocation_rule

     The allocation rule is interpreted by the  scheduler  thread
     and helps the scheduler to decide how to distribute parallel
     processes among the available machines. If, for instance,  a
     parallel environment is built for shared memory applications
     only, all parallel processes have to be assigned to a single
     machine, no matter how much suitable machines are available.
     If, however, the parallel environment  follows  the  distri-
     buted  memory  paradigm,  an  even distribution of processes
     among machines may be favorable.
     The current version of the scheduler  only  understands  the
     following allocation rules:

<int>:    An integer number fixing the number  of  processes
           per  host.  If the number is 1, all processes have
           to reside  on  different  hosts.  If  the  special
           denominator  $pe_slots  is used, the full range of
           processes as specified with the qsub(1) -pe switch
           has  to  be  allocated on a single host (no matter
           which value belonging  to  the  range  is  finally
           chosen for the job to be allocated).

$fill_up: Starting from the best  suitable  host/queue,  all
          available  slots  are allocated. Further hosts and
          queues are "filled up" as  long  as  a  job  still
          requires slots for parallel tasks.

$round_robin:
          From all suitable hosts a single slot is allocated
          until  all tasks requested by the parallel job are
          dispatched. If more tasks are requested than suit-
          able hosts are found, allocation starts again from
          the  first  host.  The  allocation  scheme   walks
          through  suitable  hosts  in a best-suitable-first
          order.

來源： http : //gridscheduler.sourceforge.net/htmlman/htmlman5/sge_pe.html

強制 SGE 使用多台服務器

問題描述

1 個解決方案

解決方案1
1 2019-03-11 13:57:34

強制 SGE 使用多台服務器

問題描述

1 個解決方案

解決方案1 1 2019-03-11 13:57:34

解決方案1
1 2019-03-11 13:57:34