简体   繁体   English

这个 CP-SAT 模型会更快吗?

[英]Could this CP-SAT model be faster?

My team is building a CP-SAT solver that schedules assignments (think homework) over a period of days with variable availability (time available to do assignments).我的团队正在构建一个 CP-SAT 求解器,它可以在几天内以可变的可用性(可用于做作业的时间)安排作业(想想家庭作业)。 We're trying to speed up our model.我们正在努力加速我们的模型。

We've tried num_search_workers and other parameter tuning but want to check for other speed increases.我们已经尝试了 num_search_workers 和其他参数调整,但想要检查其他速度提升。 The aim being to solve problems with ~100days and up to 2000 assignments in 5-10seconds (benchmarked on M1 mac).目标是在 5-10 秒内解决约 100 天和多达 2000 个任务的问题(以 M1 mac 为基准)。 Any ideas?有任何想法吗?

Problem Description: Place a assignments across d days respecting these requirements问题描述:按照这些要求在 d 天内布置作业

  • Assignment time on a day must not exceed that day's time available一天的分配时间不得超过当天的可用时间
  • Assignment dependencies should be respected (if A needs B then B should not occur after A)应尊重分配依赖关系(如果 A 需要 B,则 B 不应出现在 A 之后)
  • Assignments can be split (in order to better fit across days with little time)作业可以拆分(以便在时间不多的情况下更好地适应不同的日子)
  • Optimize for diversity of assignment types on a day优化一天的作业类型多样性

Solving slows dramatically with # days and # assignments.解决 # 天和 # 任务会显着减慢。 This is expected but we'd like to know if you can suggest possible speedups这是意料之中的,但我们想知道您是否可以建议可能的加速

Here's an example unit test.这是一个示例单元测试。 Hopefully shows the splitting, ordering, and time constraints.希望显示拆分、排序和时间限制。

days = [{"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}]
assignments = [
    {"id": 1, "resourceType": "Type0", "seconds": 2400, "deps": [], "instances": 2},
    {"id": 2, "resourceType": "Type0", "seconds": 1200, "deps": [1], "instances": 1},
    {"id": 3, "resourceType": "Type0", "seconds": 1200, "deps": [1, 2], "instances": 1},
    ]
result = cp_sat.CP_SAT_FAST.schedule(days, assignments, options=solver_options)
# expect a list of lists where each inner list is a day with the included assignments
expected = shared.SolverOutput(feasible=True, solution=[
    [{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
    [{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
    [{"id": 2, "resourceType": "Type0", "time": 1200, "instances": 1}],
    [{"id": 3, "resourceType": "Type0", "time": 1200, "instances": 1}],
    ])
self.assertEqual(result, expected)

And here's the solver:这是求解器:

import math
from typing import List, Dict

from ortools.sat.python import cp_model
import numpy as np

import planner.solvers as solvers
from planner.shared import SolverOutput, SolverOptions


class CP_SAT_FAST(solvers.Solver):
    """
    CP_SAT_FAST is a CP_SAT solver with speed optimizations and a time limit (passed in through options).
    """

    @staticmethod
    def schedule(days: List[Dict], assignments: List[Dict], options: SolverOptions) -> SolverOutput:
        """
        Schedules a list of assignments on a studyplan of days

        Arguments:
        days: list of dicts containing available time for that day
        assignments: list of assignments to place on schedule
        """

        model = cp_model.CpModel()

        num_assignments = len(assignments)
        num_days = len(days)

        # x[d, a] shows is assignment a is on day d
        x = np.zeros((num_days, num_assignments), cp_model.IntVar) 

        # used for resource diversity optimization
        total_resource_types = 4
        unique_today = []

        # upper and lower bounds used for dependency ordering (if a needs b then b must be before or on the day of a)
        day_ub = {}
        day_lb = {}

        # track assignment splitting
        instances = {}
        assignment_times = {}

        id_to_assignment = {}
        for a, asm in enumerate(assignments):

            # track upper and lower bounds
            day_ub[a] = model.NewIntVar(0, num_days, "day_ub")
            day_lb[a] = model.NewIntVar(0, num_days, "day_lb")
            asm["ub"] = day_ub[a]
            asm["lb"] = day_lb[a]
            id_to_assignment[asm["id"]] = asm

            max_instances = min(num_days, asm.get("instances", num_days))
            
            # each assignment must occur at least once
            instances[a] = model.NewIntVar(1, max_instances, f"instances_{a}")
            model.AddHint(instances[a], max_instances)

            # when split keep a decision variable of assignment time
            assignment_times[a] = model.NewIntVar(asm.get("seconds") // max_instances, asm.get("seconds"), f"assignment_time_{a}")
            model.AddDivisionEquality(assignment_times[a], asm.get("seconds"), instances[a])  

        for d in range(num_days):

            time_available = days[d].get("secondsAvailable", 0)
            if time_available <= 0:
                # no assignments on zero-time days
                model.Add(sum(x[d]) == 0)

            else:
                
                # track resource diversity on this day
                type0_today = model.NewBoolVar(f"type0_on_{d}")
                type1_today = model.NewBoolVar(f"type1_on_{d}")
                type2_today = model.NewBoolVar(f"type2_on_{d}")
                type3_today = model.NewBoolVar(f"type3_on_{d}")
                types_today = model.NewIntVar(0, total_resource_types, f"unique_on_{d}")
                
                task_times = []

                for a, asm in enumerate(assignments):

                    # x[d, a] = True if assignment a is on day d
                    x[d, a] = model.NewBoolVar(f"x[{d},{a}]")
                    
                    # set assignment upper and lower bounds for ordering
                    model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
                    model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])
                    
                    # track if a resource type is on a day for resource diversity optimization
                    resourceType = asm.get("resourceType")
                    if resourceType == "Type0":
                        model.AddImplication(x[d, a], type0_today)
                    elif resourceType == "Type1":
                        model.AddImplication(x[d, a], type1_today)
                    elif resourceType == "Type2":
                        model.AddImplication(x[d, a], type2_today)
                    elif resourceType == "Type3":
                        model.AddImplication(x[d, a], type3_today)
                    else:
                        raise RuntimeError(f"Unknown resource type {asm.get('resourceType')}")

                    # track of task time (considering splitting), for workload requirements
                    task_times.append(model.NewIntVar(0, asm.get("seconds"), f"time_{a}_on_{d}"))
                    model.Add(task_times[a] == assignment_times[a]).OnlyEnforceIf(x[d, a])

                # time assigned to day d cannot exceed the day's available time
                model.Add(time_available >= sum(task_times))

                # sum the unique resource types on this day for later optimization
                model.Add(sum([type0_today, type1_today, type2_today, type3_today]) == types_today)
                unique_today.append(types_today)


        """
        Resource Diversity:

        Keeps track of what instances of a resource type appear on each day
        and the minimum number of unique resource types on any day. (done above ^)
        
        Then the model objective is set to maximize that minimum
        """
        total_diversity = model.NewIntVar(0, num_days * total_resource_types, "total_diversity")
        model.Add(sum(unique_today) == total_diversity)

        avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
        model.AddDivisionEquality(avg_diversity, total_diversity, num_days)

        # Set objective
        model.Maximize(avg_diversity)


        # Assignment Occurance/Splitting and Dependencies
        for a, asm in enumerate(assignments):
            
            # track how many times an assignment occurs (since we can split)
            model.Add(instances[a] == sum(x[d, a] for d in range(num_days))) 

            # Dependencies 
            for needed_asm in asm.get("deps", []):
                needed_ub = id_to_assignment[needed_asm]["ub"]
                
                # this asm's lower bound must be greater than or equal to the upper bound of the dependency
                model.Add(num_days - asm["lb"] >= needed_ub)

        # Solve
        solver = cp_model.CpSolver()

        # set time limit
        solver.parameters.max_time_in_seconds = float(options.time_limit)
        solver.parameters.preferred_variable_order = 1
        solver.parameters.initial_polarity = 0
        # solver.parameters.stop_after_first_solution = True
        # solver.parameters.num_search_workers = 8

        intermediate_printer = SolutionPrinter()
        status = solver.Solve(model, intermediate_printer)


        print("\nStats")
        print(f"  - conflicts       : {solver.NumConflicts()}")
        print(f"  - branches        : {solver.NumBranches()}")
        print(f"  - wall time       : {solver.WallTime()}s")
        print()

        if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
            sp = []

            for i, d in enumerate(days):
                day_time = 0
                days_tasks = []
                for a, asm in enumerate(assignments):
                    if solver.Value(x[i, a]) >= 1:
                        asm_time = math.ceil(asm.get("seconds") / solver.Value(instances[a]))
                        day_time += asm_time

                        days_tasks.append({"id": asm["id"], "resourceType": asm.get("resourceType"), "time": asm_time, "instances": solver.Value(instances[a])})
                
                sp.append(days_tasks)

            return SolverOutput(feasible=True, solution=sp)
            
        else:
            return SolverOutput(feasible=False, solution=[])


class SolutionPrinter(cp_model.CpSolverSolutionCallback):

    def __init__(self):
        cp_model.CpSolverSolutionCallback.__init__(self)
        self.__solution_count = 0

    def on_solution_callback(self):
        print(f"Solution {self.__solution_count} objective value = {self.ObjectiveValue()}")
        self.__solution_count += 1

Before answering your actual question I want to point out a few things in your model that I suspect are not working as you intended.在回答您的实际问题之前,我想指出您模型中的一些事情,我怀疑这些事情没有按您的预期工作。

The constraints on the assignment types present on a given day给定日期存在的作业类型的限制

model.AddImplication(x[d, a], type0_today)

etc., do enforce that type0_today == 1 if there is an assignment of that type on that day.等等,如果当天有该类型的分配,请强制执行type0_today == 1 However, it does not enforce that type0_today == 0 if there are no assignments of that type on that day.但是,如果当天没有该类型的赋值,它不会强制执行type0_today == 0 The solver is still free to choose type0_today == 1 , and it will do so, because that fulfills this constraint and also directly increases the objective function.求解器仍然可以自由选择type0_today == 1 ,它会这样做,因为这满足了这个约束,并且还直接增加了目标函数。 You will probably discover in the optimal solution to the test case you gave that all the type0_today to type3_today variables are 1 and that avg_diversity == 4 in the optimal solution, even though there are no assignments of any type but 0 in the input data.您可能会发现在您给出的测试用例的最佳解决方案中,所有type0_todaytype3_today变量都是 1,并且avg_diversity == 4在最佳解决方案中,即使在输入数据中没有任何类型的赋值,但 0。 In the early stages of modelling, it's always a good idea to check the value of all the variables in the model for plausibility.在建模的早期阶段,检查模型中所有变量的值是否合理总是一个好主意。

Since I don't have a Python installation, I translated your model to c# to be able to do some experiments.由于我没有安装Python,我将你的模型翻译成c#以便能够做一些实验。 Sorry, you'll have to translate into the equivalent Python code.抱歉,您必须转换为等效的 Python 代码。 I reformulated the constraint on the type0_today variables to use an array type_today[d, t] (for day d and type t ) and use the AddMaxEquality constraint, which for Boolean variables is equivalent to the logical OR of all the participating variables:我重新制定了对type0_today变量的约束,以使用数组type_today[d, t] (对于天d和类型t )并使用AddMaxEquality约束,对于布尔变量,它等效于所有参与变量的逻辑 OR:

    // For each day...
    for (int d = 0; d < num_days; d++)
        {
            // ... make a list for each assignment type of all x[d, a] where a has that type.
            List<IntVar>[] assignmentsByType = new List<IntVar>[total_resource_types];
            for (int t = 0; t < total_resource_types; t++)
            {
                assignmentsByType[t] = new List<IntVar>();
            }
            for (int a = 0; a < num_assignments; a++)
            {
                int t = getType(assignments[a].resourceType);
                assignmentsByType[t].Add(x[d, a]);
            }
            // Constrain the types present on the day to be the logical OR of assignments with that type on that day
            for (int t = 0; t < total_resource_types; t++)
            {
                if (assignmentsByType[t].Count > 0)
                {
                    model.AddMaxEquality(type_today[d, t], assignmentsByType[t]); 
                }
                else
                {
                    model.Add(type_today[d, t] == 0);
                }

            }
        }

You compute the average diversity as你计算平均多样性为

        avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
        model.AddDivisionEquality(avg_diversity, total_diversity, num_days)

Since the solver only works with integer variables, avg_diversity will be exactly one of the values 0, 1, 2, 3 or 4 with no fractional part.由于求解器仅适用于整数变量,因此avg_diversity将恰好是值 0、1、2、3 或 4 之一,没有小数部分。 The constraint AddDivisionEquality will also ensure that total_diversity is an exact integer multiple of both avg_diversity and num_days .约束AddDivisionEquality也将确保total_diversity既是一个确切的整数倍avg_diversitynum_days This is a very strong restriction on the solutions and will lead to infeasibility in many cases that I don't think you intended.这是对解决方案的非常严格的限制,并且在我认为您不希望的许多情况下会导致不可行。

For example, avg_diversity == 3 , num_days == 20 and total_diversity == 60 would be an allowed solution, but total_diversity == 63 would not be allowed, although there are three days in that solution with higher diversity than in the one with total_diversity == 60 .例如, avg_diversity == 3num_days == 20num_days == 20 total_diversity == 60将是允许的解决方案,但total_diversity == 63将不被允许,尽管该解决方案中有三天的多样性比具有total_diversity == 60解决方案更高total_diversity == 60

Instead, I recommend that you eliminate the variable avg_diversity and its constraint and simply use total_diversity as your objective function.相反,我建议您消除变量avg_diversity及其约束,并简单地使用total_diversity作为您的目标函数。 Since the number of days is a fixed constant during the solution, maximizing the total diversity will be equivalent without introducing artificial infeasibilities.由于天数在求解过程中是一个固定常数,因此在不引入人为不可行性的情况下最大化总多样性将是等效的。

That said, here is my answer.也就是说,这是我的答案。

Generic constraint satisfaction problems are in general NP problems and should not be expected to scale well.通用约束满足问题通常是 NP 问题,不能很好地扩展。 Although many specific problem formulations can actually be solved quickly, small changes in the input data or the formulation can push the problem into a black hole of exponentiality.尽管实际上可以快速解决许多特定问题的公式,但输入数据或公式的微小变化可能会将问题推入指数性黑洞。 There is really no other approach than trying out various methods to see what works best with your exact problem.除了尝试各种方法来查看哪种方法最适合您的确切问题之外,确实没有其他方法。

Although it sounds paradoxical, it is easier for the solver to find optimal solutions for strongly constrained problems than for lightly constrained ones (assuming they are feasible!).尽管这听起来很矛盾,但求解器为强约束问题找到最优解比为轻度约束问题更容易(假设它们是可行的!)。 The search space in a strongly constrained problem is smaller than in the lightly constrained one, so the solver has fewer choices about what to experiment with to optimize and therefore completes the job faster.强约束问题中的搜索空间比轻度约束问题的搜索空间小,因此求解器在进行优化方面的选择较少,因此可以更快地完成工作。

First suggestion第一个建议

In your problem, you have variables day_ub and day_lb for each assignment.在您的问题中,每个任务都有变量day_ubday_lb These have a range from 0 to num_days .这些范围从 0 到num_days The constraints on them对他们的约束

                    model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
                    model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])

allow the solver freedom to choose any value between 0 and the largest d resp.允许求解器自由选择 0 和最大d之间的任何值。 largest (num_days - d) (inclusive).最大(num_days - d) (含)。 During the optimization, the solver probably spends time trying out different values for these variables but rarely discovers that it leads to an improvement;在优化过程中,求解器可能会花时间为这些变量尝试不同的值,但很少发现它会带来改进; that would happen only when the placement of a dependent assignment would be changed.只有当依赖分配的位置发生变化时才会发生这种情况。

You can eliminate the variables day_ub and day_lb and their constraints and instead formulate the dependencies directly with the x variables.您可以消除变量day_ubday_lb及其约束,而是直接使用x变量制定依赖关系。

In my c# model I reformulated the assignment dependency constraint as follows:在我的 c# 模型中,我重新制定了分配依赖约束,如下所示:

for (int a = 0; a < num_assignments; a++)
            {
                Assignment assignment = assignments[a];
                foreach (int predecessorIndex in getPredecessorAssignmentIndicesFor(assignment))
                {
                    for (int d1 = 0; d1 < num_days; d1++)
                    {
                        for (int d2 = 0; d2 < d1; d2++)
                        {
                            model.AddImplication(x[d1, predecessorIndex], x[d2, a].Not());
                        }
                    }
                }
            }

In words: if an assignment B ( predecessorIndex ) on which assignment A ( a ) depends is placed on day d1 , then all the x[0..d1, a] must be false.换句话说:如果赋值 A ( a ) 所依赖的赋值 B ( predecessorIndex x[0..d1, a] ) 被放置在第d1天,那么所有的x[0..d1, a]必须是假的。 This directly relates the dependencies using the x variables insteading of introducing helping variables with additional freedom which bog down the solver.这直接与使用x变量的依赖关系相关联,而不是引入具有额外自由度的帮助变量,这会使求解器陷入困境。 This change reduces the number of variables in the problem and increases the number of constraints, both of which help the solver.此更改减少了问题中的变量数量并增加了约束数量,这两者都有助于求解器。

In an experiment I did with 25 days and 35 assignments, checking the model stats showed在我用 25 天和 35 个作业完成的实验中,检查模型统计数据显示

Original:原来的:

#Variables: 2020
#kIntDiv:   35
#kIntMax:   100
#kLinear1:  1750
#kLinear2:  914
#kLinearN:  86
Total constraints   2885

New formulation:新配方:

#Variables: 1950
#kBoolOr:   11700
#kIntDiv:   35
#kIntMax:   100
#kLinear2:  875
#kLinearN:  86
Total constraints   12796

So the new formulation has fewer variables but far more constraints.所以新公式的变量更少,但约束更多。

The solution times in the experiment were improved, the solver took only 2,6 s to achieve total_diversity == 68 instead of over 90 s.实验中的求解时间得到了改进,求解器只用了 2.6 秒就达到了total_diversity == 68而不是超过 90 秒。

Original formulation原始配方

Time    Objective
0,21    56
0,53    59
0,6 60
0,73    61
0,75    62
0,77    63
2,9 64
3,82    65
3,84    66
91,01   67
91,03   68
91,05   69

New formulation新配方

Time    Objective
0,2347  41
0,3066  42
0,4252  43
0,4602  44
0,5014  49
0,6437  50
0,6777  51
0,6948  52
0,7108  53
0,9593  54
1,0178  55
1,1535  56
1,2023  57
1,2351  58
1,2595  59
1,2874  60
1,3097  61
1,3325  62
1,388   63
1,5698  64
2,4948  65
2,5993  66
2,6198  67
2,6431  68
32,5665 69

目标改进与时间的关系图

Of course, the solution times you get will be strongly dependent on the input data.当然,您获得的求解时间将在很大程度上取决于输入数据。

Second suggestion第二个建议

During my experiments I observed that solutions are found much more quickly when the assignments have a lot of dependencies.在我的实验中,我观察到当分配有很多依赖项时,找到解决方案的速度要快得多。 This is consistent with more highly constrained models being easier to solve.这与更容易求解的更高度约束的模型一致。

If you often have assignments of the same type and duration (like the numbers 2 and 3 in your test data) and they both have instance == 1` and either no dependencies or the same ones, then exchanging their position in the solution will not improve the objective.如果您经常有相同类型和持续时间的分配(如测试数据中的数字 2 和 3),并且它们都具有 instance == 1` 并且没有依赖项或相同的依赖项,那么交换它们在解决方案中的位置将不会改进目标。

In a pre-processing step you could look for such duplicates and make one of them dependent on the other.在预处理步骤中,您可以查找此类重复项并使其中一个依赖于另一个。 This is essentially a symmetry-breaking constraint.这本质上是破坏对称性的约束。 This will prevent the solver from wasting time with an attempt to see if exchanging their positions would improve the objective.这将防止求解器浪费时间试图查看交换他们的位置是否会改善目标。

Third suggestion第三条建议

The solution needs to deal with determining how many instances of each assignment will be present in a solution.解决方案需要处理确定每个分配的实例将出现在解决方案中的数量。 That requires two variables for each assignment instances[a] and assignment_times[a] with an associated constraint.这需要每个赋值instances[a]assignment_times[a]两个变量以及关联的约束。

Instead of doing this, you could get rid of the variables instances[a] and assignment_times[a] and instead split assignments with instances > 1 into multiple assignments in a preprocessing step.您可以去掉变量instances[a]assignment_times[a] ,而不是这样做,而是在预处理步骤中将instances > 1分配拆分为多个分配。 For example, in your test data, assignment 1 would be split into two assignments 1_1 and 1_2 each having instances == 1 and seconds = 1200 .例如,在您的测试数据中,作业 1 将拆分为两个作业 1_1 和 1_2,每个作业都有instances == 1seconds = 1200 For this test case where instances == 2 for assignment 1, this will not have any effect on the final solution-- maybe the solver will schedule 1_1 and 1_2 on the same day, maybe not, but the final result is equivalent to splitting or not but doesn't need the extra variables.对于分配 1 instances == 2这个测试用例,这不会对最终解决方案产生任何影响——也许求解器会在同一天安排 1_1 和 1_2,也许不会,但最终结果相当于分裂或不是但不需要额外的变量。

In the preprocessing step, when an assignment is split, you should add symmetry breaking constraints to make 1_2 dependent on 1_1, etc., for the reasons mentioned above.在预处理步骤中,当分配被拆分时,出于上述原因,您应该添加对称破坏约束,使 1_2 依赖于 1_1 等。

When an assignment has instances > 2 , splitting it into multiple assignments before the run is actually a change to the model.当分配的instances > 2 ,在运行之前将其拆分为多个分配实际上是对模型的更改。 For example, if instances == 3 and seconds = 2400 you cannot get a solution in which the assignment is split over two days with 1200 s each;例如,如果instances == 3 and seconds = 2400你不能得到一个解决方案,其中分配被分成两天,每个1200秒; the solver will always be scheduling 3 assignments of 800 s each.求解器将始终安排 3 个分配,每个分配 800 秒。

So this suggestion is actually a change to the model and you'll have to determine if that is acceptable or not.因此,此建议实际上是对模型的更改,您必须确定这是否可以接受。

The total diversity will usually be helped by having more instances of an assignment to place, so the change may not have large practical consequences.有更多的分配实例通常会有助于总体多样性,因此更改可能不会产生很大的实际后果。 It would also allow scheduling 2/3 of an assignment on one day and the remaining 1/3 on another day, so it even adds some flexibility.它还允许在一天安排 2/3 的作业,在另一天安排剩余的 1/3,因此它甚至增加了一些灵活性。

But this may or may not be acceptable in terms of your overall requirements.但是,就您的总体要求而言,这可能会也可能不会被接受。

In all cases, you'll have to test changes with your exact data to see if they really result in an improvement or not.在所有情况下,您都必须使用您的确切数据测试更改,以查看它们是否真的会带来改进。

I hope this helps (and that that this is a real world problem and not a homework assignment, as I did spend a few hours investigating...).我希望这会有所帮助(并且这是一个现实世界的问题,而不是家庭作业,因为我确实花了几个小时调查......)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM