简体   繁体   English

当工人多于工作时如何使用匈牙利算法以及如何将工作的副本链接回原始?

[英]How to use the hungarian algorithm when there are more workers than jobs and how to link the copies of jobs back to the original?

I am trying to use the Hungarian Algorithm to sort students into classes based on their preferences.我正在尝试使用匈牙利算法根据学生的喜好将学生分类。

In my dataset, there are ~550 students, and each one has a list of top 5 preferences.在我的数据集中,大约有 550 名学生,每个学生都有一个排名前 5 的偏好列表。 Every preference is an ID that corresponds to a class.每个首选项都是一个对应于 class 的 ID。 Each class has a minimum and maximum capacity (in my case a min cap of 15 people and a max cap of 27 people) and there are 21 classes in the dataset.每个 class 都有最小和最大容量(在我的例子中,最小容量为 15 人,最大容量为 27 人),数据集中有 21 个类。

Here is an example dataset for every student:这是每个学生的示例数据集:

Email Email first choice第一选择 second choice第二选择 third choice第三选择 fourth choice第四选择 fith choice合适的选择
email@gmail.com电子邮件@gmail.com 4 4 7 7 1 1 8 8 21 21
email2@gmail.com email2@gmail.com 6 6 9 9 14 14 17 17 2 2

Here is an example dataset for every class:这是每个 class 的示例数据集:

Class Title Class 标题 Class ID Class ID Min Cap最小帽 Max Cap最大上限
Class Title1 Class Title1 1 1 15 15 27 27
Class Title2 Class Title2 2 2 15 15 27 27
Class Title3 Class Title3 3 3 15 15 27 27

I need to sort the students into their preferred classes while also following the minimum capacity as well as the maximum capacity.我需要将学生分类到他们喜欢的班级,同时还要遵循最小容量和最大容量。 For that, I am planning to use the Hungarian Algorithm.为此,我打算使用匈牙利算法。

Because there are ~550 students and 21 classes and for the Hungarian algorithm to work, I was planning to make "copies" of the classes.因为有大约 550 名学生和 21 个班级,并且为了让匈牙利算法起作用,我打算制作这些班级的“副本”。 I would first make 15 copies of every class (like class 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, etc.) to fill the minimum requirement of the class and then would add even more copies to the most popular classes among the students until there is an equal number of students and copies of classes. I would first make 15 copies of every class (like class 1.1, 1.2, 1.3, 1.4, 2.1, 2.2, 2.3, etc.) to fill the minimum requirement of the class and then would add even more copies to the most popular classes among直到有相同数量的学生和班级副本。

Then, working with the copies and the preferences of the students, I was thinking of making a dictionary of dictionaries and use this implementation of the algorithm.然后,根据学生的副本和偏好,我正在考虑制作一本字典并使用算法的实现。

I have a couple questions:我有几个问题:

  1. Is this plan a good one or are there better solutions for the problem I have?这个计划是一个好的计划还是有更好的解决方案来解决我遇到的问题?
  2. How do I make copies of the class that all link back to the original ID?如何制作所有链接回原始 ID 的 class 的副本?
  3. When implementing it into the algorithm I am supposed to put the preferences of the students into the dictionary (as shown in the GitHub link) but if there are now IDs such as 1.1 and people's choice is 1 and there are no actual classes like that in the algorithm, how should I go around that?在将其实施到算法中时,我应该将学生的偏好放入字典中(如 GitHub 链接所示),但如果现在有 1.1 之类的 ID,人们的选择是 1,并且在算法,我应该如何 go 围绕那个?

Thank you in advance and let me know if you need any clarifications提前谢谢您,如果您需要任何澄清,请告诉我

  1. I think this should work, though I would make one change.我认为这应该可行,尽管我会做一个改变。 Instead of starting with the minimum copies of each class, start with the maximum.不是从每个 class 的最小副本开始,而是从最大值开始。 With 550 students and 21 classes, all of the classes will be almost full.有 550 名学生和 21 个班级,所有班级都将几乎满员。 You shouldn't have to worry about the minimum.您不必担心最低限度。

  2. Just like you said will work.就像你说的那样会起作用。 Class 1 becomes 1.1, 1.2, and 1.3, and you can easily reverse that. Class 1 变为 1.1、1.2 和 1.3,您可以轻松地将其反转。

  3. Since you're "duplicating" the classes, you have to "duplicate" the ID's too.由于您要“复制”类,因此您也必须“复制”ID。 If class 1 becomes 1.1, 1.2, and 1.3, then a student who prefers class 1 should be linked to classes 1.1, 1.2, and 1.3 instead with the same priority.如果 class 1 变为 1.1、1.2 和 1.3,则更喜欢 class 1 的学生应链接到 1.1、1.2 和 1.3 类,而不是具有相同的优先级。

The main problem I see with this though is that Python is slow, and it looks like you have 550 students * 5 choices * 21 classes * 27 capacity = 1,559,250 connections.我看到的主要问题是 Python 很慢,看起来你有 550 个学生 * 5 个选择 * 21 个班级 * 27 个容量 = 1,559,250 个连接。 This will take a lot of memory and a lot of time, and the open issues on that repository don't give me much confidence that it's capable of calculating this.这将花费大量 memory 和大量时间,并且该存储库上的未解决问题并没有让我对它能够计算这一点充满信心。 I haven't tested that implementation myself though so I can't be sure.我自己还没有测试过这个实现,所以我不能确定。

Because students state only 5 choices, a perfect matching may not be possible.因为学生state只有5个选择,不可能完美匹配。 If you're not looking to produce a perfect matching, then you could use serial dictatorship .如果您不希望产生完美的匹配,那么您可以使用serial dictatorship It will be much faster than the Hungarian algorithm.它会比匈牙利算法快得多。

A pseudo-code:一个伪代码:

students = list of students
for stud in students:
    if the number of students yet to be matched is greater than total of minimum quotas:
        if [classes in stud's preference list that have not reached max capacity]:
            stud is assigned to their most preferred among them
            update the class's remaining seats
        else:
            stud is unmatched
    else:
        if [classes in stud's preference list that have not reached min capacity]:
            stud is assigned to their most preferred among them
            update the class's remaining seats
        else:
            stud is unmatched

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何找到没有工人的工作清单 - How to find a list of jobs with no workers 匈牙利算法:每个工人有多个工作 - Hungarian algorithm: multiple jobs per worker 如何最佳地将大型对象作为附加参数传递给map函数,而又不会在工作人员/工作中复制该对象? - How to optimally pass a large object as an additional parameter to a map function without copies of that object from being made across workers/jobs? GCP 数据流批处理作业 - 防止工作人员在批处理作业中一次运行多个元素 - GCP Dataflow Batch jobs - Preventing workers from running more than one element at a time in a batch job 没有工人时获取待处理的 Celery 工作列表 - Get list of pending Celery jobs when no workers 如何使用HBase作为Hadoop流作业的源 - How to use hbase as a source for hadoop streaming jobs 如何制作所有连接回原始变量的变量“副本” - How to make 'copies' of variables that all connect back to the original variable lur不休地开始更多的工作 - Slurm starting more jobs than I ask 如何使用Python获取amazon.jobs的所有更多阅读链接 - How to get all the read more links of amazon.jobs with Python 如何在Python中使用beantalkc来排队URL和执行作业 - How to use beanstalkc in Python to queue URLs and perform jobs
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM