[英]Python - How to efficiently iterate through the subsets of a dictionary?
I have a dictionary with 500 DateFrames in it. 我有一本有500个DateFrames的字典。 Each data frame has columns 'date' , 'num_patients' . 每个数据框都有列“ date” , “ num_ Patients” 。 I apply the model to all the data frames in the dictionary, but Python kernel crash due to large data in the dictionary. 我将模型应用于字典中的所有数据帧,但是由于字典中的大量数据,Python内核崩溃。
prediction_all = {}
for key, value in dict.items():
model = Prophet(holidays = holidays).fit(value)
future = model.make_future_dataframe(periods = 365)
forecast = model.predict(future)
prediction_all[key] = forecast.tail()
So, then I've subsetted the dictionary and applied the model to each subset. 因此,然后我将字典子集化,并将模型应用于每个子集。
dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
prediction_dict1 = {}
for key, value in dict1.items():
model = Prophet(holidays = holidays).fit(value)
future = model.make_future_dataframe(periods = 365)
forecast = model.predict(future)
prediction_dict1[key] = forecast.tail()
dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}
prediction_dict2 = {}
for key, value in dict2.items():
model = Prophet(holidays = holidays).fit(value)
future = model.make_future_dataframe(periods = 365)
forecast = model.predict(future)
prediction_dict2[key] = forecast.tail()
But I will need to run the code above for 10 times since I have 500 DataFrames (10 subsets). 但是由于我有500个DataFrame(10个子集),因此我需要将上述代码运行10次。 Is there a more efficient way to do this? 有没有更有效的方法可以做到这一点?
One immediate improvement is to drop the sorted() and slicing step and replace it with heapq.nsmallest() which will do many fewer comparisons. 一种直接的改进是删除sorted()和切片步骤,并用heapq.nsmallest()代替它,这将减少很多比较。 Also, the .keys()
is not necessary since dicts automatically iterate over their keys by default. 另外, .keys()
也不是必需的,因为默认情况下.keys()
自动在其键上进行迭代。
Replace: 更换:
dict1 = {k: dict[k] for k in sorted(dict.keys())[:50]}
dict2 = {k: dict[k] for k in sorted(dict.keys())[50:100]}
With: 附:
lowest_keys = heapq.nsmallest(100, dict)
dict1 = {k : dict[k] for k in lowest_keys[:50]}
dict2 = {k : dict[k] for k in lowest_keys[50:100]}
The big for-loop in your code looks to only need .values()
instead of .items()
since key doesn't seem to be used. 代码中的大型for循环似乎只需要.values()
而不是.items()
因为似乎未使用key 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.