简体   繁体   English

如何使用LINQ to Objects安排作业而不重叠?

[英]How to schedule jobs without overlap using LINQ to Objects?

This is another resource-allocation problem. 这是另一个资源分配问题。 My goal is to run a query to assign the top-priority job for any time-slot to one of two CPU cores (just an example, so let's assume no interrupts or multi-tasking). 我的目标是运行一个查询,以将任何时隙的最高优先级作业分配给两个CPU内核之一(仅作为示例,因此我们假设没有中断或多任务处理)。 Note: this is similar to my earlier post about partitioning , but focuses on overlapping times and assigning multiple items, not just the top-priority item. 注意:这类似于我先前关于分区的文章 ,但重点是重叠时间和分配多个项目,而不仅仅是最优先的项目。

Here is our object: 这是我们的对象:

public class Job
{
    public int Id;
    public int Priority;
    public DateTime Begin;
    public DateTime End;
}

The real dataset is very large, but for this example, let's say there are 1000 jobs to be assigned to two CPU cores. 实际数据集非常大,但是对于本示例,假设有1000个作业要分配给两个CPU内核。 They are all loaded into memory, and I need to run a single LINQ to Objects query against them. 它们都已加载到内存中,我需要针对它们运行单个LINQ to Objects查询。 This is currently taking almost 8 seconds and 1.4 million comparisons. 目前,这花费了将近8秒和140万次比较。

I have leveraged the logic cited in this post to determine whether two items are overlapping, but unlike that post, I don't simply need to find overlapping items, but to schedule the top item of any overlapping set, and then schedule the next one. 我已经利用所引用的逻辑这篇文章 ,以确定两个项目是否是重叠的,但不像那个帖子,我不只是需要找到重叠的项目,但安排任何重叠集的最高项,然后安排下一个。

Before I get to the code, let me point out the steps of the current inneficient algorithm: 在阅读代码之前,让我指出当前无效算法的步骤:

  1. Determine the remaining jobs (by removing any already assigned) 确定剩余的作业(通过删除所有已分配的作业)
  2. Assign jobs to the current core by self-joining all remaining jobs and selecting the top overlapping job by priority. 通过自动加入所有剩余的作业并按优先级选择顶部重叠的作业,将作业分配给当前核心。
  3. Concatenate the new jobs by executing the query 通过执行查询来连接新作业
  4. Repeat starting at Stop 1 for each remaining core 对其余的每个核心重复从停止1开始的操作

Questions: 问题:

  • How can this be done most efficiently? 如何最有效地做到这一点?
  • Can I avoid the sub-query in step 2? 我可以避免步骤2中的子查询吗? Perhaps by grouping, but I'm not sure how I can group based on the .Overlaps() extension method. 也许通过分组,但是我不确定如何基于.Overlaps()扩展方法进行分组。
  • The jobs are already ordered. 作业已被订购。 Can step 2 leverage that fact that and only compare against items within a short range instead of every single job? 第2步能否利用仅与短范围内的项目而不是每项工作进行比较的事实?
  • Is there a more efficient to assign to cores rather than looping? 有没有比循环更有效的分配给内核了? This could avoid executing the query in Step 3? 这样可以避免在步骤3中执行查询? Again, if I could group sets of overlaped jobs, then assigning cores is simply a matter of selecting N per overlaped set. 同样,如果我可以将重叠作业的集合进行分组,则分配核心仅是为每个重叠集合选择N个问题。

Full Sample Code: 完整的示例代码:

public class Job
{
    public static long Iterations;

    public int Id;
    public int Priority;
    public DateTime Begin;
    public DateTime End;

    public bool Overlaps(Job other)
    {
        Iterations++;
        return this.End > other.Begin && this.Begin < other.End;
    }
}

public class Assignment
{
    public Job Job;
    public int Core;
}

class Program
{
    static void Main(string[] args)
    {
        const int Jobs = 1000;
        const int Cores = 2;
        const int ConcurrentJobs = Cores + 1;
        const int Priorities = Cores + 3;
        DateTime startTime = new DateTime(2011, 3, 1, 0, 0, 0, 0);
        Console.WriteLine(string.Format("{0} Jobs x {1} Cores", Jobs, Cores));
        var timer = Stopwatch.StartNew();

        Console.WriteLine("Populating data");
        var jobs = new List<Job>();
        for (int jobId = 0; jobId < Jobs; jobId++)
        {
            var jobStart = startTime.AddHours(jobId / ConcurrentJobs).AddMinutes(jobId % ConcurrentJobs);
            jobs.Add(new Job() { Id = jobId, Priority = jobId % Priorities, Begin = jobStart, End = jobStart.AddHours(0.5) });
        }
        Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
        timer.Restart();

        Console.WriteLine("Assigning Jobs to Cores");
        IEnumerable<Assignment> assignments = null;
        for (int core = 0; core < Cores; core++)
        {
            // avoid modified closures by creating local variables
            int localCore = core;
            var localAssignments = assignments;

            // Step 1: Determine the remaining jobs
            var remainingJobs = localAssignments == null ? 
                                                jobs :
                                                from j in jobs where !(from a in localAssignments select a.Job).Contains(j) select j;

            // Step 2: Assign the top priority job in any time-slot to the core
            var assignmentsForCore = from s1 in remainingJobs
                              where
                                  (from s2 in remainingJobs
                                   where s1.Overlaps(s2)
                                   orderby s2.Priority
                                   select s2).First().Equals(s1)
                              select new Assignment { Job = s1, Core = localCore };

            // Step 3: Accumulate the results (unfortunately requires a .ToList() to avoid massive over-joins)
            assignments = assignments == null ? assignmentsForCore.ToList() : assignments.Concat(assignmentsForCore.ToList());
        }

        // This is where I'd like to Execute the query one single time across all cores, but have to do intermediate steps to avoid massive-over-joins
        assignments = assignments.ToList();

        Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
        Console.WriteLine("\nJobs:");
        foreach (var job in jobs.Take(20))
        {
            Console.WriteLine(string.Format("{0}-{1} Id {2} P{3}", job.Begin, job.End, job.Id, job.Priority));
        }

        Console.WriteLine("\nAssignments:");
        foreach (var assignment in assignments.OrderBy(a => a.Job.Begin).Take(10))
        {
            Console.WriteLine(string.Format("{0}-{1} Id {2} P{3} C{4}", assignment.Job.Begin, assignment.Job.End, assignment.Job.Id, assignment.Job.Priority, assignment.Core));
        }

        Console.WriteLine(string.Format("\nTotal Comparisons: {0:n}", Job.Iterations));

        Console.WriteLine("Any key to continue");
        Console.ReadKey();
    }
}

Sample Output: 样本输出:

1000 Jobs x 2 Cores 1000个工作x 2核
Populating data 填充数据
Completed in 0.00ms 在0.00ms内完成
Assigning Jobs to Cores 将作业分配给核心
Completed in 7,998.00ms 完成于7,998.00ms

Jobs: 工作:
3/1/2011 12:00:00 AM-3/1/2011 12:30:00 AM Id 0 P0 3/1/2011 12:00:00 AM-3 / 1/2011 12:30:00 AM Id 0 P0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1 3/1/2011 12:01:00 AM-3 / 1/2011 12:31:00 AM Id 1 P1
3/1/2011 12:02:00 AM-3/1/2011 12:32:00 AM Id 2 P2 3/1/2011 12:02:00 AM-3 / 1/2011 12:32:00 AM Id 2 P2
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3 3/1/2011 1:00:00 AM-3 / 1/2011 1:30:00 AM Id 3 P3
3/1/2011 1:01:00 AM-3/1/2011 1:31:00 AM Id 4 P4 3/1/2011 1:01:00 AM-3 / 1/2011 1:31:00 AM Id 4 P4
3/1/2011 1:02:00 AM-3/1/2011 1:32:00 AM Id 5 P0 2011/3/1 1:02:00 AM-3 / 1/2011 1:32:00 AM Id 5 P0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1 2011/3/1 2:00:00 AM-3/1/2011 2:30:00 Id 6 P1
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2 3/1/2011 2:01:00 AM-3 / 1/2011 2:31:00 AM Id 7 P2
3/1/2011 2:02:00 AM-3/1/2011 2:32:00 AM Id 8 P3 3/1/2011 2:02:00 AM-3 / 1/2011 2:32:00 AM Id 8 P3
3/1/2011 3:00:00 AM-3/1/2011 3:30:00 AM Id 9 P4 2011/3/1 3:00:00 AM-3/1/2011 3:30:00 Id 9 P4
3/1/2011 3:01:00 AM-3/1/2011 3:31:00 AM Id 10 P0 2011/3/1 3:01:00 AM-3 / 1/2011 3:31:00 Id 10 P0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1 3/1/2011 3:02:00 AM-3 / 1/2011 3:32:00 AM Id 11 P1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2 2011/3/1 4:00:00 AM-3/1/2011 4:30:00 Id 12 P2
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3 2011/3/1 4:01:00 AM-3 / 1/2011 4:31:00 AM Id 13 P3
3/1/2011 4:02:00 AM-3/1/2011 4:32:00 AM Id 14 P4 3/1/2011 4:02:00 AM-3 / 1/2011 4:32:00 AM Id 14 P4
3/1/2011 5:00:00 AM-3/1/2011 5:30:00 AM Id 15 P0 2011/3/1 5:00:00 AM-3 / 1/2011 5:30:00 Id 15 P0
3/1/2011 5:01:00 AM-3/1/2011 5:31:00 AM Id 16 P1 3/1/2011 5:01:00 AM-3 / 1/2011 5:31:00 AM Id 16 P1
3/1/2011 5:02:00 AM-3/1/2011 5:32:00 AM Id 17 P2 3/1/2011 5:02:00 AM-3 / 1/2011 5:32:00 AM Id 17 P2
3/1/2011 6:00:00 AM-3/1/2011 6:30:00 AM Id 18 P3 2011/3/1 6:00:00 AM-3 / 1/2011 6:30:00 AM Id 18 P3
3/1/2011 6:01:00 AM-3/1/2011 6:31:00 AM Id 19 P4 2011年3月1日6:01:00 AM至2011年3月1日6:31:00 AM Id 19 P4

Assignments: 作业:
3/1/2011 12:00:00 AM-3/1/2011 12:30:00 AM Id 0 P0 C0 2011/3/1 12:00:00 AM-3 / 1/2011 12:30:00 AM Id 0 P0 C0
3/1/2011 12:01:00 AM-3/1/2011 12:31:00 AM Id 1 P1 C1 2011/3/1 12:01:00 AM-3 / 1/2011 12:31:00 AM Id 1 P1 C1
3/1/2011 1:00:00 AM-3/1/2011 1:30:00 AM Id 3 P3 C1 2011年3月1日1:00:00 AM-3 / 1/2011年1月3日AM Id 3 P3 C1
3/1/2011 1:02:00 AM-3/1/2011 1:32:00 AM Id 5 P0 C0 2011/3/1 1:02:00 AM-3 / 1/2011 1:32:00 AM Id 5 P0 C0
3/1/2011 2:00:00 AM-3/1/2011 2:30:00 AM Id 6 P1 C0 2011/3/1 2:00:00 AM-3/1/2011 2:30:00 Id 6 P1 C0
3/1/2011 2:01:00 AM-3/1/2011 2:31:00 AM Id 7 P2 C1 3/1/2011 2:01:00 AM-3 / 1/2011 2:31:00 AM Id 7 P2 C1
3/1/2011 3:01:00 AM-3/1/2011 3:31:00 AM Id 10 P0 C0 2011/3/1 3:01:00 AM-3 / 1/2011 3:31:00 Id 10 P0 C0
3/1/2011 3:02:00 AM-3/1/2011 3:32:00 AM Id 11 P1 C1 3/1/2011 3:02:00 AM-3 / 1/2011 3:32:00 AM Id 11 P1 C1
3/1/2011 4:00:00 AM-3/1/2011 4:30:00 AM Id 12 P2 C0 2011/3/1 4:00:00 AM-3/1/2011 4:30:00 Id 12 P2 C0
3/1/2011 4:01:00 AM-3/1/2011 4:31:00 AM Id 13 P3 C1 3/1/2011 4:01:00 AM-3 / 1/2011 4:31:00 AM Id 13 P3 C1
3/1/2011 5:00:00 AM-3/1/2011 5:30:00 AM Id 15 P0 C0 2011/3/1 5:00:00 AM-3 / 1/2011 5:30:00 AM Id 15 P0 C0

Total Comparisons: 1,443,556.00 总计比较:1,443,556.00
Any key to continue 任何键继续

Is there a reason for using linq to object collections for this task? 是否有理由使用linq来完成此任务的对象集合? I think that I would create an active list, put all of the jobs in a queue and pop the next one out of the queue whenever the active list dipped below 10 and stick it into the active list. 我认为我将创建一个活动列表,将所有作业放入队列,并在活动列表浸入10以下时将下一个作业弹出队列,然后将其粘贴到活动列表中。 It's easy enough to track which core is executing which task and assign the next task in the queue the the least busy core. 跟踪哪个内核正在执行哪个任务并为队列中的下一个任务分配最不繁忙的内核就很容易了。 Wire up a finished event to the job or just monitor the active list and you'll know when it's appropriate to pop another job off the queue and into the active list. 将完成的事件关联到作业,或者仅监视活动列表,您就会知道何时将另一个作业从队列弹出并进入活动列表是合适的。

I would rather do it in a single loop. 我宁愿在一个循环中进行操作。 My produces a different result from yours. 我的结果与您的结果不同。 Yours scheduled 2/3 of all the jobs. 您安排了所有工作的2/3。 Mine scheduled all. 我的全部预定了。 I will add explanations later. 稍后我将添加说明。 Going off for an appointment now. 现在要去约会。

public class Job
{
    public static long Iterations;

    public int Id;
    public int Priority;
    public DateTime Begin;
    public DateTime End;

    public bool Overlaps(Job other)
    {
        Iterations++;
        return this.End > other.Begin && this.Begin < other.End;
    }
}

public class Assignment : IComparable<Assignment>
{
    public Job Job;
    public int Core;

    #region IComparable<Assignment> Members

    public int CompareTo(Assignment other)
    {
        return Job.Begin.CompareTo(other.Job.Begin);
    }

    #endregion
}

class Program
{
    static void Main(string[] args)
    {
        const int Jobs = 1000;
        const int Cores = 2;
        const int ConcurrentJobs = Cores + 1;
        const int Priorities = Cores + 3;

        DateTime startTime = new DateTime(2011, 3, 1, 0, 0, 0, 0);
        Console.WriteLine(string.Format("{0} Jobs x {1} Cores", Jobs, Cores));
        var timer = Stopwatch.StartNew();

        Console.WriteLine("Populating data");
        var jobs = new List<Job>();
        for (int jobId = 0; jobId < Jobs; jobId++)
        {
            var jobStart = startTime.AddHours(jobId / ConcurrentJobs).AddMinutes(jobId % ConcurrentJobs);
            jobs.Add(new Job() { Id = jobId, Priority = jobId % Priorities, Begin = jobStart, End = jobStart.AddHours(0.5) });
        }
        Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
        timer.Reset();

        Console.WriteLine("Assigning Jobs to Cores");
        List<Assignment>[] assignments = new List<Assignment>[Cores];
        for (int core = 0; core < Cores; core++)
            assignments[core] = new List<Assignment>();

        Job[] lastJobs = new Job[Cores];
        foreach (Job j in jobs)
        {
            Job job = j;
            bool assigned = false;
            for (int core = 0; core < Cores; core++)
            {
                if (lastJobs[core] == null || !lastJobs[core].Overlaps(job))
                {
                    // Assign directly if no last job or no overlap with last job
                    lastJobs[core] = job;
                    assignments[core].Add(new Assignment { Job = job, Core = core });
                    assigned = true;
                    break;
                }
                else if (job.Priority > lastJobs[core].Priority)
                {
                    // Overlap and higher priority, so we replace
                    Job temp = lastJobs[core];
                    lastJobs[core] = job;
                    job = temp; // Will try to later assign to other core

                    assignments[core].Add(new Assignment { Job = job, Core = core });
                    assigned = true;
                    break;
                }
            }
            if (!assigned)
            {
                // TODO: What to do if not assigned? Your code seems to just ignore them
            }
        }
        List<Assignment> merged = new List<Assignment>();
        for (int core = 0; core < Cores; core++)
            merged.AddRange(assignments[core]);
        merged.Sort();
        Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
        timer.Reset();

        Console.WriteLine(string.Format("\nTotal Comparisons: {0:n}", Job.Iterations));
        Job.Iterations = 0; // Reset to count again

        {
            IEnumerable<Assignment> assignments2 = null;
            for (int core = 0; core < Cores; core++)
            {


                // avoid modified closures by creating local variables
                int localCore = core;
                var localAssignments = assignments2;

                // Step 1: Determine the remaining jobs
                var remainingJobs = localAssignments == null ?
                                                    jobs :
                                                    from j in jobs where !(from a in localAssignments select a.Job).Contains(j) select j;

                // Step 2: Assign the top priority job in any time-slot to the core
                var assignmentsForCore = from s1 in remainingJobs
                                         where
                                             (from s2 in remainingJobs
                                              where s1.Overlaps(s2)
                                              orderby s2.Priority
                                              select s2).First().Equals(s1)
                                         select new Assignment { Job = s1, Core = localCore };

                // Step 3: Accumulate the results (unfortunately requires a .ToList() to avoid massive over-joins)
                assignments2 = assignments2 == null ? assignmentsForCore.ToList() : assignments2.Concat(assignmentsForCore.ToList());
            }

            // This is where I'd like to Execute the query one single time across all cores, but have to do intermediate steps to avoid massive-over-joins
            assignments2 = assignments2.ToList();

            Console.WriteLine(string.Format("Completed in {0:n}ms", timer.ElapsedMilliseconds));
            Console.WriteLine("\nJobs:");
            foreach (var job in jobs.Take(20))
            {
                Console.WriteLine(string.Format("{0}-{1} Id {2} P{3}", job.Begin, job.End, job.Id, job.Priority));
            }

            Console.WriteLine("\nAssignments:");
            foreach (var assignment in assignments2.OrderBy(a => a.Job.Begin).Take(10))
            {
                Console.WriteLine(string.Format("{0}-{1} Id {2} P{3} C{4}", assignment.Job.Begin, assignment.Job.End, assignment.Job.Id, assignment.Job.Priority, assignment.Core));
            }

            if (merged.Count != assignments2.Count())
                System.Console.WriteLine("Difference count {0}, {1}", merged.Count, assignments2.Count());
            for (int i = 0; i < merged.Count() && i < assignments2.Count(); i++)
            {
                var a2 = assignments2.ElementAt(i);
                var a = merged[i];
                if (a.Job.Id != a2.Job.Id)
                    System.Console.WriteLine("Difference at {0} {1} {2}", i, a.Job.Begin, a2.Job.Begin);
                if (i % 100 == 0) Console.ReadKey();
            }
        }


        Console.WriteLine(string.Format("\nTotal Comparisons: {0:n}", Job.Iterations));

        Console.WriteLine("Any key to continue");
        Console.ReadKey();
    }
}

Removed due to major bug. 由于重大错误已删除。 Reworking on it.. :P 正在重做..:P

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM