[英]Could this CP-SAT model be faster?
My team is building a CP-SAT solver that schedules assignments (think homework) over a period of days with variable availability (time available to do assignments).我的团队正在构建一个 CP-SAT 求解器,它可以在几天内以可变的可用性(可用于做作业的时间)安排作业(想想家庭作业)。 We're trying to speed up our model.
我们正在努力加速我们的模型。
We've tried num_search_workers and other parameter tuning but want to check for other speed increases.我们已经尝试了 num_search_workers 和其他参数调整,但想要检查其他速度提升。 The aim being to solve problems with ~100days and up to 2000 assignments in 5-10seconds (benchmarked on M1 mac).
目标是在 5-10 秒内解决约 100 天和多达 2000 个任务的问题(以 M1 mac 为基准)。 Any ideas?
有任何想法吗?
Problem Description: Place a assignments across d days respecting these requirements问题描述:按照这些要求在 d 天内布置作业
Solving slows dramatically with # days and # assignments.解决 # 天和 # 任务会显着减慢。 This is expected but we'd like to know if you can suggest possible speedups
这是意料之中的,但我们想知道您是否可以建议可能的加速
Here's an example unit test.这是一个示例单元测试。 Hopefully shows the splitting, ordering, and time constraints.
希望显示拆分、排序和时间限制。
days = [{"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}, {"secondsAvailable": 1200}]
assignments = [
{"id": 1, "resourceType": "Type0", "seconds": 2400, "deps": [], "instances": 2},
{"id": 2, "resourceType": "Type0", "seconds": 1200, "deps": [1], "instances": 1},
{"id": 3, "resourceType": "Type0", "seconds": 1200, "deps": [1, 2], "instances": 1},
]
result = cp_sat.CP_SAT_FAST.schedule(days, assignments, options=solver_options)
# expect a list of lists where each inner list is a day with the included assignments
expected = shared.SolverOutput(feasible=True, solution=[
[{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
[{"id": 1, "resourceType": "Type0", "time": 1200, "instances": 2}],
[{"id": 2, "resourceType": "Type0", "time": 1200, "instances": 1}],
[{"id": 3, "resourceType": "Type0", "time": 1200, "instances": 1}],
])
self.assertEqual(result, expected)
And here's the solver:这是求解器:
import math
from typing import List, Dict
from ortools.sat.python import cp_model
import numpy as np
import planner.solvers as solvers
from planner.shared import SolverOutput, SolverOptions
class CP_SAT_FAST(solvers.Solver):
"""
CP_SAT_FAST is a CP_SAT solver with speed optimizations and a time limit (passed in through options).
"""
@staticmethod
def schedule(days: List[Dict], assignments: List[Dict], options: SolverOptions) -> SolverOutput:
"""
Schedules a list of assignments on a studyplan of days
Arguments:
days: list of dicts containing available time for that day
assignments: list of assignments to place on schedule
"""
model = cp_model.CpModel()
num_assignments = len(assignments)
num_days = len(days)
# x[d, a] shows is assignment a is on day d
x = np.zeros((num_days, num_assignments), cp_model.IntVar)
# used for resource diversity optimization
total_resource_types = 4
unique_today = []
# upper and lower bounds used for dependency ordering (if a needs b then b must be before or on the day of a)
day_ub = {}
day_lb = {}
# track assignment splitting
instances = {}
assignment_times = {}
id_to_assignment = {}
for a, asm in enumerate(assignments):
# track upper and lower bounds
day_ub[a] = model.NewIntVar(0, num_days, "day_ub")
day_lb[a] = model.NewIntVar(0, num_days, "day_lb")
asm["ub"] = day_ub[a]
asm["lb"] = day_lb[a]
id_to_assignment[asm["id"]] = asm
max_instances = min(num_days, asm.get("instances", num_days))
# each assignment must occur at least once
instances[a] = model.NewIntVar(1, max_instances, f"instances_{a}")
model.AddHint(instances[a], max_instances)
# when split keep a decision variable of assignment time
assignment_times[a] = model.NewIntVar(asm.get("seconds") // max_instances, asm.get("seconds"), f"assignment_time_{a}")
model.AddDivisionEquality(assignment_times[a], asm.get("seconds"), instances[a])
for d in range(num_days):
time_available = days[d].get("secondsAvailable", 0)
if time_available <= 0:
# no assignments on zero-time days
model.Add(sum(x[d]) == 0)
else:
# track resource diversity on this day
type0_today = model.NewBoolVar(f"type0_on_{d}")
type1_today = model.NewBoolVar(f"type1_on_{d}")
type2_today = model.NewBoolVar(f"type2_on_{d}")
type3_today = model.NewBoolVar(f"type3_on_{d}")
types_today = model.NewIntVar(0, total_resource_types, f"unique_on_{d}")
task_times = []
for a, asm in enumerate(assignments):
# x[d, a] = True if assignment a is on day d
x[d, a] = model.NewBoolVar(f"x[{d},{a}]")
# set assignment upper and lower bounds for ordering
model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])
# track if a resource type is on a day for resource diversity optimization
resourceType = asm.get("resourceType")
if resourceType == "Type0":
model.AddImplication(x[d, a], type0_today)
elif resourceType == "Type1":
model.AddImplication(x[d, a], type1_today)
elif resourceType == "Type2":
model.AddImplication(x[d, a], type2_today)
elif resourceType == "Type3":
model.AddImplication(x[d, a], type3_today)
else:
raise RuntimeError(f"Unknown resource type {asm.get('resourceType')}")
# track of task time (considering splitting), for workload requirements
task_times.append(model.NewIntVar(0, asm.get("seconds"), f"time_{a}_on_{d}"))
model.Add(task_times[a] == assignment_times[a]).OnlyEnforceIf(x[d, a])
# time assigned to day d cannot exceed the day's available time
model.Add(time_available >= sum(task_times))
# sum the unique resource types on this day for later optimization
model.Add(sum([type0_today, type1_today, type2_today, type3_today]) == types_today)
unique_today.append(types_today)
"""
Resource Diversity:
Keeps track of what instances of a resource type appear on each day
and the minimum number of unique resource types on any day. (done above ^)
Then the model objective is set to maximize that minimum
"""
total_diversity = model.NewIntVar(0, num_days * total_resource_types, "total_diversity")
model.Add(sum(unique_today) == total_diversity)
avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
model.AddDivisionEquality(avg_diversity, total_diversity, num_days)
# Set objective
model.Maximize(avg_diversity)
# Assignment Occurance/Splitting and Dependencies
for a, asm in enumerate(assignments):
# track how many times an assignment occurs (since we can split)
model.Add(instances[a] == sum(x[d, a] for d in range(num_days)))
# Dependencies
for needed_asm in asm.get("deps", []):
needed_ub = id_to_assignment[needed_asm]["ub"]
# this asm's lower bound must be greater than or equal to the upper bound of the dependency
model.Add(num_days - asm["lb"] >= needed_ub)
# Solve
solver = cp_model.CpSolver()
# set time limit
solver.parameters.max_time_in_seconds = float(options.time_limit)
solver.parameters.preferred_variable_order = 1
solver.parameters.initial_polarity = 0
# solver.parameters.stop_after_first_solution = True
# solver.parameters.num_search_workers = 8
intermediate_printer = SolutionPrinter()
status = solver.Solve(model, intermediate_printer)
print("\nStats")
print(f" - conflicts : {solver.NumConflicts()}")
print(f" - branches : {solver.NumBranches()}")
print(f" - wall time : {solver.WallTime()}s")
print()
if status == cp_model.OPTIMAL or status == cp_model.FEASIBLE:
sp = []
for i, d in enumerate(days):
day_time = 0
days_tasks = []
for a, asm in enumerate(assignments):
if solver.Value(x[i, a]) >= 1:
asm_time = math.ceil(asm.get("seconds") / solver.Value(instances[a]))
day_time += asm_time
days_tasks.append({"id": asm["id"], "resourceType": asm.get("resourceType"), "time": asm_time, "instances": solver.Value(instances[a])})
sp.append(days_tasks)
return SolverOutput(feasible=True, solution=sp)
else:
return SolverOutput(feasible=False, solution=[])
class SolutionPrinter(cp_model.CpSolverSolutionCallback):
def __init__(self):
cp_model.CpSolverSolutionCallback.__init__(self)
self.__solution_count = 0
def on_solution_callback(self):
print(f"Solution {self.__solution_count} objective value = {self.ObjectiveValue()}")
self.__solution_count += 1
Before answering your actual question I want to point out a few things in your model that I suspect are not working as you intended.在回答您的实际问题之前,我想指出您模型中的一些事情,我怀疑这些事情没有按您的预期工作。
The constraints on the assignment types present on a given day给定日期存在的作业类型的限制
model.AddImplication(x[d, a], type0_today)
etc., do enforce that type0_today == 1
if there is an assignment of that type on that day.等等,如果当天有该类型的分配,请强制执行
type0_today == 1
。 However, it does not enforce that type0_today == 0
if there are no assignments of that type on that day.但是,如果当天没有该类型的赋值,它不会强制执行
type0_today == 0
。 The solver is still free to choose type0_today == 1
, and it will do so, because that fulfills this constraint and also directly increases the objective function.求解器仍然可以自由选择
type0_today == 1
,它会这样做,因为这满足了这个约束,并且还直接增加了目标函数。 You will probably discover in the optimal solution to the test case you gave that all the type0_today
to type3_today
variables are 1 and that avg_diversity == 4
in the optimal solution, even though there are no assignments of any type but 0 in the input data.您可能会发现在您给出的测试用例的最佳解决方案中,所有
type0_today
到type3_today
变量都是 1,并且avg_diversity == 4
在最佳解决方案中,即使在输入数据中没有任何类型的赋值,但 0。 In the early stages of modelling, it's always a good idea to check the value of all the variables in the model for plausibility.在建模的早期阶段,检查模型中所有变量的值是否合理总是一个好主意。
Since I don't have a Python installation, I translated your model to c# to be able to do some experiments.由于我没有安装Python,我将你的模型翻译成c#以便能够做一些实验。 Sorry, you'll have to translate into the equivalent Python code.
抱歉,您必须转换为等效的 Python 代码。 I reformulated the constraint on the
type0_today
variables to use an array type_today[d, t]
(for day d
and type t
) and use the AddMaxEquality
constraint, which for Boolean variables is equivalent to the logical OR of all the participating variables:我重新制定了对
type0_today
变量的约束,以使用数组type_today[d, t]
(对于天d
和类型t
)并使用AddMaxEquality
约束,对于布尔变量,它等效于所有参与变量的逻辑 OR:
// For each day...
for (int d = 0; d < num_days; d++)
{
// ... make a list for each assignment type of all x[d, a] where a has that type.
List<IntVar>[] assignmentsByType = new List<IntVar>[total_resource_types];
for (int t = 0; t < total_resource_types; t++)
{
assignmentsByType[t] = new List<IntVar>();
}
for (int a = 0; a < num_assignments; a++)
{
int t = getType(assignments[a].resourceType);
assignmentsByType[t].Add(x[d, a]);
}
// Constrain the types present on the day to be the logical OR of assignments with that type on that day
for (int t = 0; t < total_resource_types; t++)
{
if (assignmentsByType[t].Count > 0)
{
model.AddMaxEquality(type_today[d, t], assignmentsByType[t]);
}
else
{
model.Add(type_today[d, t] == 0);
}
}
}
You compute the average diversity as你计算平均多样性为
avg_diversity = model.NewIntVar(0, total_resource_types, "avg_diversity")
model.AddDivisionEquality(avg_diversity, total_diversity, num_days)
Since the solver only works with integer variables, avg_diversity
will be exactly one of the values 0, 1, 2, 3 or 4 with no fractional part.由于求解器仅适用于整数变量,因此
avg_diversity
将恰好是值 0、1、2、3 或 4 之一,没有小数部分。 The constraint AddDivisionEquality
will also ensure that total_diversity
is an exact integer multiple of both avg_diversity
and num_days
.约束
AddDivisionEquality
也将确保total_diversity
既是一个确切的整数倍avg_diversity
和num_days
。 This is a very strong restriction on the solutions and will lead to infeasibility in many cases that I don't think you intended.这是对解决方案的非常严格的限制,并且在我认为您不希望的许多情况下会导致不可行。
For example, avg_diversity == 3
, num_days == 20
and total_diversity == 60
would be an allowed solution, but total_diversity == 63
would not be allowed, although there are three days in that solution with higher diversity than in the one with total_diversity == 60
.例如,
avg_diversity == 3
、 num_days == 20
和num_days == 20
total_diversity == 60
将是允许的解决方案,但total_diversity == 63
将不被允许,尽管该解决方案中有三天的多样性比具有total_diversity == 60
解决方案更高total_diversity == 60
。
Instead, I recommend that you eliminate the variable avg_diversity
and its constraint and simply use total_diversity
as your objective function.相反,我建议您消除变量
avg_diversity
及其约束,并简单地使用total_diversity
作为您的目标函数。 Since the number of days is a fixed constant during the solution, maximizing the total diversity will be equivalent without introducing artificial infeasibilities.由于天数在求解过程中是一个固定常数,因此在不引入人为不可行性的情况下最大化总多样性将是等效的。
That said, here is my answer.也就是说,这是我的答案。
Generic constraint satisfaction problems are in general NP problems and should not be expected to scale well.通用约束满足问题通常是 NP 问题,不能很好地扩展。 Although many specific problem formulations can actually be solved quickly, small changes in the input data or the formulation can push the problem into a black hole of exponentiality.
尽管实际上可以快速解决许多特定问题的公式,但输入数据或公式的微小变化可能会将问题推入指数性黑洞。 There is really no other approach than trying out various methods to see what works best with your exact problem.
除了尝试各种方法来查看哪种方法最适合您的确切问题之外,确实没有其他方法。
Although it sounds paradoxical, it is easier for the solver to find optimal solutions for strongly constrained problems than for lightly constrained ones (assuming they are feasible!).尽管这听起来很矛盾,但求解器为强约束问题找到最优解比为轻度约束问题更容易(假设它们是可行的!)。 The search space in a strongly constrained problem is smaller than in the lightly constrained one, so the solver has fewer choices about what to experiment with to optimize and therefore completes the job faster.
强约束问题中的搜索空间比轻度约束问题的搜索空间小,因此求解器在进行优化方面的选择较少,因此可以更快地完成工作。
First suggestion第一个建议
In your problem, you have variables day_ub
and day_lb
for each assignment.在您的问题中,每个任务都有变量
day_ub
和day_lb
。 These have a range from 0 to num_days
.这些范围从 0 到
num_days
。 The constraints on them对他们的约束
model.Add(day_ub[a] >= d).OnlyEnforceIf(x[d, a])
model.Add(day_lb[a] >= (num_days - d)).OnlyEnforceIf(x[d, a])
allow the solver freedom to choose any value between 0 and the largest d
resp.允许求解器自由选择 0 和最大
d
之间的任何值。 largest (num_days - d)
(inclusive).最大
(num_days - d)
(含)。 During the optimization, the solver probably spends time trying out different values for these variables but rarely discovers that it leads to an improvement;在优化过程中,求解器可能会花时间为这些变量尝试不同的值,但很少发现它会带来改进; that would happen only when the placement of a dependent assignment would be changed.
只有当依赖分配的位置发生变化时才会发生这种情况。
You can eliminate the variables day_ub
and day_lb
and their constraints and instead formulate the dependencies directly with the x
variables.您可以消除变量
day_ub
和day_lb
及其约束,而是直接使用x
变量制定依赖关系。
In my c# model I reformulated the assignment dependency constraint as follows:在我的 c# 模型中,我重新制定了分配依赖约束,如下所示:
for (int a = 0; a < num_assignments; a++)
{
Assignment assignment = assignments[a];
foreach (int predecessorIndex in getPredecessorAssignmentIndicesFor(assignment))
{
for (int d1 = 0; d1 < num_days; d1++)
{
for (int d2 = 0; d2 < d1; d2++)
{
model.AddImplication(x[d1, predecessorIndex], x[d2, a].Not());
}
}
}
}
In words: if an assignment B ( predecessorIndex
) on which assignment A ( a
) depends is placed on day d1
, then all the x[0..d1, a]
must be false.换句话说:如果赋值 A (
a
) 所依赖的赋值 B ( predecessorIndex
x[0..d1, a]
) 被放置在第d1
天,那么所有的x[0..d1, a]
必须是假的。 This directly relates the dependencies using the x
variables insteading of introducing helping variables with additional freedom which bog down the solver.这直接与使用
x
变量的依赖关系相关联,而不是引入具有额外自由度的帮助变量,这会使求解器陷入困境。 This change reduces the number of variables in the problem and increases the number of constraints, both of which help the solver.此更改减少了问题中的变量数量并增加了约束数量,这两者都有助于求解器。
In an experiment I did with 25 days and 35 assignments, checking the model stats showed在我用 25 天和 35 个作业完成的实验中,检查模型统计数据显示
Original:原来的:
#Variables: 2020
#kIntDiv: 35
#kIntMax: 100
#kLinear1: 1750
#kLinear2: 914
#kLinearN: 86
Total constraints 2885
New formulation:新配方:
#Variables: 1950
#kBoolOr: 11700
#kIntDiv: 35
#kIntMax: 100
#kLinear2: 875
#kLinearN: 86
Total constraints 12796
So the new formulation has fewer variables but far more constraints.所以新公式的变量更少,但约束更多。
The solution times in the experiment were improved, the solver took only 2,6 s to achieve total_diversity == 68
instead of over 90 s.实验中的求解时间得到了改进,求解器只用了 2.6 秒就达到了
total_diversity == 68
而不是超过 90 秒。
Original formulation原始配方
Time Objective
0,21 56
0,53 59
0,6 60
0,73 61
0,75 62
0,77 63
2,9 64
3,82 65
3,84 66
91,01 67
91,03 68
91,05 69
New formulation新配方
Time Objective
0,2347 41
0,3066 42
0,4252 43
0,4602 44
0,5014 49
0,6437 50
0,6777 51
0,6948 52
0,7108 53
0,9593 54
1,0178 55
1,1535 56
1,2023 57
1,2351 58
1,2595 59
1,2874 60
1,3097 61
1,3325 62
1,388 63
1,5698 64
2,4948 65
2,5993 66
2,6198 67
2,6431 68
32,5665 69
Of course, the solution times you get will be strongly dependent on the input data.当然,您获得的求解时间将在很大程度上取决于输入数据。
Second suggestion第二个建议
During my experiments I observed that solutions are found much more quickly when the assignments have a lot of dependencies.在我的实验中,我观察到当分配有很多依赖项时,找到解决方案的速度要快得多。 This is consistent with more highly constrained models being easier to solve.
这与更容易求解的更高度约束的模型一致。
If you often have assignments of the same type and duration (like the numbers 2 and 3 in your test data) and they both have instance == 1` and either no dependencies or the same ones, then exchanging their position in the solution will not improve the objective.如果您经常有相同类型和持续时间的分配(如测试数据中的数字 2 和 3),并且它们都具有 instance == 1` 并且没有依赖项或相同的依赖项,那么交换它们在解决方案中的位置将不会改进目标。
In a pre-processing step you could look for such duplicates and make one of them dependent on the other.在预处理步骤中,您可以查找此类重复项并使其中一个依赖于另一个。 This is essentially a symmetry-breaking constraint.
这本质上是破坏对称性的约束。 This will prevent the solver from wasting time with an attempt to see if exchanging their positions would improve the objective.
这将防止求解器浪费时间试图查看交换他们的位置是否会改善目标。
Third suggestion第三条建议
The solution needs to deal with determining how many instances of each assignment will be present in a solution.解决方案需要处理确定每个分配的实例将出现在解决方案中的数量。 That requires two variables for each assignment
instances[a]
and assignment_times[a]
with an associated constraint.这需要每个赋值
instances[a]
和assignment_times[a]
两个变量以及关联的约束。
Instead of doing this, you could get rid of the variables instances[a]
and assignment_times[a]
and instead split assignments with instances > 1
into multiple assignments in a preprocessing step.您可以去掉变量
instances[a]
和assignment_times[a]
,而不是这样做,而是在预处理步骤中将instances > 1
分配拆分为多个分配。 For example, in your test data, assignment 1 would be split into two assignments 1_1 and 1_2 each having instances == 1
and seconds = 1200
.例如,在您的测试数据中,作业 1 将拆分为两个作业 1_1 和 1_2,每个作业都有
instances == 1
和seconds = 1200
。 For this test case where instances == 2
for assignment 1, this will not have any effect on the final solution-- maybe the solver will schedule 1_1 and 1_2 on the same day, maybe not, but the final result is equivalent to splitting or not but doesn't need the extra variables.对于分配 1
instances == 2
这个测试用例,这不会对最终解决方案产生任何影响——也许求解器会在同一天安排 1_1 和 1_2,也许不会,但最终结果相当于分裂或不是但不需要额外的变量。
In the preprocessing step, when an assignment is split, you should add symmetry breaking constraints to make 1_2 dependent on 1_1, etc., for the reasons mentioned above.在预处理步骤中,当分配被拆分时,出于上述原因,您应该添加对称破坏约束,使 1_2 依赖于 1_1 等。
When an assignment has instances > 2
, splitting it into multiple assignments before the run is actually a change to the model.当分配的
instances > 2
,在运行之前将其拆分为多个分配实际上是对模型的更改。 For example, if instances == 3
and seconds = 2400
you cannot get a solution in which the assignment is split over two days with 1200 s each;例如,如果
instances == 3
and seconds = 2400
你不能得到一个解决方案,其中分配被分成两天,每个1200秒; the solver will always be scheduling 3 assignments of 800 s each.求解器将始终安排 3 个分配,每个分配 800 秒。
So this suggestion is actually a change to the model and you'll have to determine if that is acceptable or not.因此,此建议实际上是对模型的更改,您必须确定这是否可以接受。
The total diversity will usually be helped by having more instances of an assignment to place, so the change may not have large practical consequences.有更多的分配实例通常会有助于总体多样性,因此更改可能不会产生很大的实际后果。 It would also allow scheduling 2/3 of an assignment on one day and the remaining 1/3 on another day, so it even adds some flexibility.
它还允许在一天安排 2/3 的作业,在另一天安排剩余的 1/3,因此它甚至增加了一些灵活性。
But this may or may not be acceptable in terms of your overall requirements.但是,就您的总体要求而言,这可能会也可能不会被接受。
In all cases, you'll have to test changes with your exact data to see if they really result in an improvement or not.在所有情况下,您都必须使用您的确切数据测试更改,以查看它们是否真的会带来改进。
I hope this helps (and that that this is a real world problem and not a homework assignment, as I did spend a few hours investigating...).我希望这会有所帮助(并且这是一个现实世界的问题,而不是家庭作业,因为我确实花了几个小时调查......)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.