簡體   English   中英

如何更有效地將這些任務分配到批次中?

[英]How can I more efficiently distribute these tasks into batches?

我將數據文件分成了幾個月,並且正在對 node.js 進行聚類以將作業分成不同的線程來處理,但是我這樣做的方式使一些線程沒有工作可做,如下所示:

thread 1 selection [ '2004-05', '2004-06', '2004-07', '2004-08' ]
thread 2 selection [ '2004-09', '2004-10', '2004-11', '2004-12' ]
thread 9 selection [ '2007-01', '2007-02', '2007-03', '2007-04' ]
thread 7 selection [ '2006-05', '2006-06', '2006-07', '2006-08' ]
thread 5 selection [ '2005-09', '2005-10', '2005-11', '2005-12' ]
thread 4 selection [ '2005-05', '2005-06', '2005-07', '2005-08' ]
thread 8 selection [ '2006-09', '2006-10', '2006-11', '2006-12' ]
thread 6 selection [ '2006-01', '2006-02', '2006-03', '2006-04' ]
thread 10 selection [ '2007-05', '2007-06', '2007-07', '2007-08' ]
thread 3 selection [ '2005-01', '2005-02', '2005-03', '2005-04' ]
thread 11 selection [ '2007-09', '2007-10', '2007-11', '2007-12' ]
thread 0 selection [ '2004-01', '2004-02', '2004-03', '2004-04' ]
thread 15 selection []
thread 14 selection []
thread 13 selection []
thread 12 selection [ '2008-01', '2008-02', '2008-03' ]

看,線程 13、14 和 15 沒有工作可做,浪費了我機器上的內核。 這是我的代碼,忽略集群樣板代碼,假設 i 等於線程的編號(在我的情況下為 0-15):

let dateStart = moment('2004-01-02');
let dateEnd = moment('2008-03-02');
let timeValues = [];
while (dateEnd > dateStart || dateStart.format('M') === dateEnd.format('M')) {
  timeValues.push(dateStart.format('YYYY-MM'));
  dateStart.add(1, 'month');
}
let i = parseInt(process.env.workerId);
let monthBatchCount = Math.ceil(timeValues.length / cpus);
let selectionStart = i * monthBatchCount;
let selectionEnd = selectionStart + monthBatchCount;
let selection = timeValues.slice(selectionStart, selectionEnd)
console.log("thread", i, "selection", selection)

如何更改我的方法以更有效地將作業分配到批次中,以便沒有線程留下空批次?

一種方法是讓每個工作人員從主線程中提取工作單元,而不是向他們推送工作。 父線程將作為工作單元的代理工作,而工人一旦產生,就會請求工作,執行工作,然后循環請求更多工作。

// Parent code

const unitsOfWork = [...];
const workers = [...];

workers.forEach(worker => {
  worker.on('message', (message) => {
    if (message.type === 'CLAIM_WORK') {
      const unit = unitsOfWork.pop();
      const message = unit ? { type: 'WORK', unit } : { type: 'WORK_FINISHED' };
      worker.postMessage(message);
    }
  });
});
// Worker code

const { parentPort } = require('worker_threads');

parentPort.on('message', (message) => {
  if (message.type === 'WORK') {
    performWork(message.unit);
    parentPort.postMessage({ type 'CLAIM_WORK' });
  } else if (message.type === 'WORK_FINISHED') {
    // Exit?
  }
});

parentPort.postMessage({ type 'CLAIM_WORK' });

Jacob 的答案是事實上最有效的解決方案,因為如果不是所有批處理作業都將花費相同的時間,那么他的方法將讓線程提前完成任務 go 回來並完成更多工作,而不是等待什么都不做而具有較難工作的線程完成。

但是以防萬一有人想知道如何正確地將隊列划分為批次,也許對於其他一些用例,在這里,我使用了相同的隊列彈出原則:

let dateStart = moment('2004-01-02');
let dateEnd = moment('2008-03-02');
let timeValues = [];
while (dateEnd > dateStart || dateStart.format('M') === dateEnd.format('M')) {
  timeValues.push(dateStart.format('YYYY-MM'));
  dateStart.add(1, 'month');
}
let i = parseInt(process.env.workerId);
let batches = [];
let workerId = 0;
while (timeValues.length > 0) {
  if (!batches[workerId]) batches[workerId] = [];
  batches[workerId].push(timeValues.pop());
  workerId++;
  if (workerId > 15) workerId = 0;
}
let batch = batches[i];
console.log("batch", batch)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM