Python-如何有效地遍历字典的子集？

Question

I have a dictionary with 500 DateFrames in it. 我有一本有500个DateFrames的字典。 Each data frame has columns 'date' , 'num_patients' . 每个数据框都有列“ date” ， “ num_ Patients” 。 I apply the model to all the data frames in the dictionary, but Python kernel crash due to large data in the dictionary. 我将模型应用于字典中的所有数据帧，但是由于字典中的大量数据，Python内核崩溃。

prediction_all = {}
for key, value in dict.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_all[key] = forecast.tail()

So, then I've subsetted the dictionary and applied the model to each subset. 因此，然后我将字典子集化，并将模型应用于每个子集。

dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
prediction_dict1 = {}
for key, value in dict1.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_dict1[key] = forecast.tail()

dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}
prediction_dict2 = {}
for key, value in dict2.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_dict2[key] = forecast.tail()

But I will need to run the code above for 10 times since I have 500 DataFrames (10 subsets). 但是由于我有500个DataFrame（10个子集），因此我需要将上述代码运行10次。 Is there a more efficient way to do this? 有没有更有效的方法可以做到这一点？

Answer 1

One immediate improvement is to drop the sorted() and slicing step and replace it with heapq.nsmallest() which will do many fewer comparisons. 一种直接的改进是删除sorted（）和切片步骤，并用heapq.nsmallest（）代替它，这将减少很多比较。 Also, the .keys() is not necessary since dicts automatically iterate over their keys by default. 另外， .keys()也不是必需的，因为默认情况下.keys()自动在其键上进行迭代。

Replace: 更换：

 dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
 dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}

With: 附：

 lowest_keys = heapq.nsmallest(100, dict)
 dict1 = {k : dict[k] for k in lowest_keys[:50]}
 dict2 = {k : dict[k] for k in lowest_keys[50:100]}

The big for-loop in your code looks to only need .values() instead of .items() since key doesn't seem to be used. 代码中的大型for循环似乎只需要.values()而不是.items()因为似乎未使用key 。

Python-如何有效地遍历字典的子集？

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-03-29 15:28:08

Python-如何有效地遍历字典的子集？

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-03-29 15:28:08

解决方案1
3 已采纳 2017-03-29 15:28:08