[英]How to efficiently iterate over a list in nested defaultdict structure in Python?
I am processing a large amount of urls (from a schedule) and I've categorized them in a nested defaultdict structure as follows: 我正在处理大量网址(根据时间表),并将它们归类为嵌套的defaultdict结构,如下所示:
My categories are: 我的类别是:
The value for weeks should be a list. 几周的值应该是一个列表。
def setup_urls(option):
urls = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))
for quarter in range(1, 5):
for week in range( 1, 53):
// logic of computing url goes here
links[option][quarter][week].append(url)
return urls
A list with multiple defaultdict in it.
[defaultdict( < function setup_urls. < locals > . < lambda > at 0x7fd30ed2d488 > , {
'option': defaultdict( < function setup_urls. < locals > . < lambda > . < locals > . < lambda > at 0x7fd3122a2158 > , {
1: defaultdict( < class 'list' > , {
45: [ url1, url2, url3, url4, url5 ]
}),
})
})]
I didn't want to use standard dictionaries because of efficieny with large datasets. 由于高效地处理大型数据集,我不想使用标准词典。 I have to store around 5000-10000 urls.
我必须存储大约5000-10000个网址。 In the future this could be around 100000.
将来可能会达到100000。
With some research of my own the usage of defaultdict should be good for performance, but the usage of lambda's doesn't seem to be very Pythonic. 通过我自己的一些研究,可以使用defaultdict来提高性能,但是使用lambda的用法似乎并不是Python风格。 Not sure if there are better solutions, but it's not my main question.
不知道是否有更好的解决方案,但这不是我的主要问题。
I currently how this code to access all urls, but it feels to like a lot of dirty code and specially not very Pythonic at all. 我目前正在使用此代码访问所有url,但是感觉像是很脏的代码,特别是根本不是Pythonic。
for dict in result:
for quarter in dict.values():
for week in quarter.values():
for url in week.values():
print(url)
I'd like to know what the better way is to access these urls in order to make use of the map-function? 我想知道哪种更好的方法是访问这些URL以利用map函数? (And is this the best way to store the urls?)
(这是存储网址的最佳方法吗?)
You can structure your logic for an arbitrary level of nested dictionaries via a recursive function. 您可以通过递归函数为任意级别的嵌套字典构建逻辑。 Below is an example using
itertools.chain
. 以下是使用
itertools.chain
的示例。
from collections import defaultdict
from itertools import chain
def get_values(d, res=[]):
for k, v in d.items():
if isinstance(v, dict):
get_values(v, res=res)
else:
res.append(v)
return list(chain.from_iterable(res))
d = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))
d[1][2][3].append(343)
d[1][2][3].append(1245)
d[1][2][4].append(563)
d[1][2][4].append(763)
res = list(get_values(d))
# [343, 1245, 563, 763]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.