简体   繁体   English

Python-如何有效地遍历字典的子集?

[英]Python - How to efficiently iterate through the subsets of a dictionary?

I have a dictionary with 500 DateFrames in it. 我有一本有500个DateFrames的字典。 Each data frame has columns 'date' , 'num_patients' . 每个数据框都有列“ date”“ num_ Patients” I apply the model to all the data frames in the dictionary, but Python kernel crash due to large data in the dictionary. 我将模型应用于字典中的所有数据帧,但是由于字典中的大量数据,Python内核崩溃。

prediction_all = {}
for key, value in dict.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_all[key] = forecast.tail()

So, then I've subsetted the dictionary and applied the model to each subset. 因此,然后我将字典子集化,并将模型应用于每个子集。

dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
prediction_dict1 = {}
for key, value in dict1.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_dict1[key] = forecast.tail()

dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}
prediction_dict2 = {}
for key, value in dict2.items():
    model = Prophet(holidays = holidays).fit(value)
    future = model.make_future_dataframe(periods = 365)
    forecast = model.predict(future)
    prediction_dict2[key] = forecast.tail()

But I will need to run the code above for 10 times since I have 500 DataFrames (10 subsets). 但是由于我有500个DataFrame(10个子集),因此我需要将上述代码运行10次。 Is there a more efficient way to do this? 有没有更有效的方法可以做到这一点?

One immediate improvement is to drop the sorted() and slicing step and replace it with heapq.nsmallest() which will do many fewer comparisons. 一种直接的改进是删除sorted()和切片步骤,并用heapq.nsmallest()代替它,这将减少很多比较。 Also, the .keys() is not necessary since dicts automatically iterate over their keys by default. 另外, .keys()也不是必需的,因为默认情况下.keys()自动在其键上进行迭代。

Replace: 更换:

 dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
 dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}

With: 附:

 lowest_keys = heapq.nsmallest(100, dict)
 dict1 = {k : dict[k] for k in lowest_keys[:50]}
 dict2 = {k : dict[k] for k in lowest_keys[50:100]}

The big for-loop in your code looks to only need .values() instead of .items() since key doesn't seem to be used. 代码中的大型for循环似乎只需要.values()而不是.items()因为似乎未使用key

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM