如何提高这个嵌套循环的性能？

Question

我正在解决一个匹配问题，我必须将学生分配到学校。 问题是我必须为每个学生考虑兄弟姐妹，因为在为每所学校设置优先级时，这是一个相关的特征。

我的数据如下所示。

Index Student_ID    Brothers
0   92713846    [79732346]
1   69095898    [83462239]
2   67668672    [75788479, 56655021, 75869616]
3   83396441    []
4   65657616    [52821691]
5   62399116    []
6   78570850    [62046889, 63029349]
7   69185379    [70285250, 78819847, 78330994]
8   66874272    []
9   78624173    [73902609, 99802441, 95706649]
10  97134369    []
11  77358607    [52492909, 59830215, 71251829]
12  56314554    [54345813, 71451741]
13  97724180    [64626337]
14  73480196    [84454182, 90435785]
15  70717221    [60965551, 98620966, 70969443]
16  60942420    [54370313, 63581164, 72976764]
17  81882157    [78787923]
18  73387623    [87909970, 57105395]
19  59115621    [62494654]
20  54650043    [69308874, 88206688]
21  53368352    [63191962, 53031183]
22  76024585    [61392497]
23  84337377    [58419239, 96762668]
24  50099636    [80373936, 54314342]
25  62184397    [89185875, 84892080, 53223034]
26  85704767    [85509773, 81710287, 78387716]
27  85585603    [66254198, 87569015, 52455599]
28  82964119    [76360309, 76069982]
29  53776152    [92585971, 74907523]
...
6204 rows × 2 columns

Student_ID是每个学生的唯一 id， Brothers是一个包含该学生兄弟姐妹的所有 id 的列表。

为了保存我的匹配数据，我创建了一个学生 class，我在其中保存了匹配所需的所有属性。 这是下载整个数据集的链接。

class Student():
    def __init__(self, index, id, vbrothers = []):
        self.__index = index
        self.__id = id
        self.__vbrothers = vbrothers

    @property
    def index(self):
        return self.__index

    @property
    def id(self):
        return self.__id

    @property
    def vbrothers(self):
        return self.__vbrothers

我正在实例化我的学生 class object 对我的 dataframe 的所有行进行循环，然后将每个行附加到列表中：

students = []
for index, row in students_data.iterrows():
    student = Student(index, row['Student_ID'],  row['Brothers'])
    students.append(student)

现在，我的问题是我需要一个指向students列表中每个兄弟姐妹的索引的指针。 实际上，我正在实现这个嵌套循环：

for student in students:
    student.vbrothers_index = [brother.index for brother in students if (student.id in brother.vbrothers)]

这是迄今为止我的整个代码中性能最差的部分。 它比第二差的部分慢 4 倍。

欢迎任何有关如何提高此嵌套循环性能的建议。

Answer 1

由于students的顺序无关紧要，因此将其设为字典：

students = {}
for index, row in students_data.iterrows():
    student = Student(index, row['Student_ID'],  row['Brothers'])
    students[row['Student_ID']] = student

现在，您可以通过他的 ID 在恒定时间内检索每个学生：

for student in students:
    student.vbrothers_index = [students[brother.id].index for brother in student.vbrothers]

如何提高这个嵌套循环的性能？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-04-13 14:20:16

如何提高这个嵌套循环的性能？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-04-13 14:20:16

解决方案1
1 已采纳 2020-04-13 14:20:16