[英]How to use the hungarian algorithm when there are more workers than jobs and how to link the copies of jobs back to the original?
I am trying to use the Hungarian Algorithm to sort students into classes based on their preferences.我正在尝试使用匈牙利算法根据学生的喜好将学生分类。
In my dataset, there are ~550 students, and each one has a list of top 5 preferences.在我的数据集中,大约有 550 名学生,每个学生都有一个排名前 5 的偏好列表。 Every preference is an ID that corresponds to a class.每个首选项都是一个对应于 class 的 ID。 Each class has a minimum and maximum capacity (in my case a min cap of 15 people and a max cap of 27 people) and there are 21 classes in the dataset.每个 class 都有最小和最大容量(在我的例子中,最小容量为 15 人,最大容量为 27 人),数据集中有 21 个类。
Here is an example dataset for every student:这是每个学生的示例数据集:
Email Email | first choice第一选择 | second choice第二选择 | third choice第三选择 | fourth choice第四选择 | fith choice合适的选择 |
---|---|---|---|---|---|
email@gmail.com电子邮件@gmail.com | 4 4 | 7 7 | 1 1 | 8 8 | 21 21 |
email2@gmail.com email2@gmail.com | 6 6 | 9 9 | 14 14 | 17 17 | 2 2 |
Here is an example dataset for every class:这是每个 class 的示例数据集:
Class Title Class 标题 | Class ID Class ID | Min Cap最小帽 | Max Cap最大上限 |
---|---|---|---|
Class Title1 Class Title1 | 1 1 | 15 15 | 27 27 |
Class Title2 Class Title2 | 2 2 | 15 15 | 27 27 |
Class Title3 Class Title3 | 3 3 | 15 15 | 27 27 |
I need to sort the students into their preferred classes while also following the minimum capacity as well as the maximum capacity.我需要将学生分类到他们喜欢的班级,同时还要遵循最小容量和最大容量。 For that, I am planning to use the Hungarian Algorithm.为此,我打算使用匈牙利算法。
Because there are ~550 students and 21 classes and for the Hungarian algorithm to work, I was planning to make "copies" of the classes.因为有大约 550 名学生和 21 个班级,并且为了让匈牙利算法起作用,我打算制作这些班级的“副本”。 I would first make 15 copies of every class (like class 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, etc.) to fill the minimum requirement of the class and then would add even more copies to the most popular classes among the students until there is an equal number of students and copies of classes. I would first make 15 copies of every class (like class 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, etc.) to fill the minimum requirement of the class and then would add even more copies to the most popular classes among直到有相同数量的学生和班级副本。
Then, working with the copies and the preferences of the students, I was thinking of making a dictionary of dictionaries and use this implementation of the algorithm.然后,根据学生的副本和偏好,我正在考虑制作一本字典并使用该算法的实现。
I have a couple questions:我有几个问题:
Thank you in advance and let me know if you need any clarifications提前谢谢您,如果您需要任何澄清,请告诉我
I think this should work, though I would make one change.我认为这应该可行,尽管我会做一个改变。 Instead of starting with the minimum copies of each class, start with the maximum.不是从每个 class 的最小副本开始,而是从最大值开始。 With 550 students and 21 classes, all of the classes will be almost full.有 550 名学生和 21 个班级,所有班级都将几乎满员。 You shouldn't have to worry about the minimum.您不必担心最低限度。
Just like you said will work.就像你说的那样会起作用。 Class 1 becomes 1.1, 1.2, and 1.3, and you can easily reverse that. Class 1 变为 1.1、1.2 和 1.3,您可以轻松地将其反转。
Since you're "duplicating" the classes, you have to "duplicate" the ID's too.由于您要“复制”类,因此您也必须“复制”ID。 If class 1 becomes 1.1, 1.2, and 1.3, then a student who prefers class 1 should be linked to classes 1.1, 1.2, and 1.3 instead with the same priority.如果 class 1 变为 1.1、1.2 和 1.3,则更喜欢 class 1 的学生应链接到 1.1、1.2 和 1.3 类,而不是具有相同的优先级。
The main problem I see with this though is that Python is slow, and it looks like you have 550 students * 5 choices * 21 classes * 27 capacity = 1,559,250 connections.我看到的主要问题是 Python 很慢,看起来你有 550 个学生 * 5 个选择 * 21 个班级 * 27 个容量 = 1,559,250 个连接。 This will take a lot of memory and a lot of time, and the open issues on that repository don't give me much confidence that it's capable of calculating this.这将花费大量 memory 和大量时间,并且该存储库上的未解决问题并没有让我对它能够计算这一点充满信心。 I haven't tested that implementation myself though so I can't be sure.我自己还没有测试过这个实现,所以我不能确定。
Because students state only 5 choices, a perfect matching may not be possible.因为学生state只有5个选择,不可能完美匹配。 If you're not looking to produce a perfect matching, then you could use serial dictatorship .如果您不希望产生完美的匹配,那么您可以使用serial dictatorship 。 It will be much faster than the Hungarian algorithm.它会比匈牙利算法快得多。
A pseudo-code:一个伪代码:
students = list of students
for stud in students:
if the number of students yet to be matched is greater than total of minimum quotas:
if [classes in stud's preference list that have not reached max capacity]:
stud is assigned to their most preferred among them
update the class's remaining seats
else:
stud is unmatched
else:
if [classes in stud's preference list that have not reached min capacity]:
stud is assigned to their most preferred among them
update the class's remaining seats
else:
stud is unmatched
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.