简体   繁体   English

如何在Python的嵌套defaultdict结构中有效地遍历列表?

[英]How to efficiently iterate over a list in nested defaultdict structure in Python?

I am processing a large amount of urls (from a schedule) and I've categorized them in a nested defaultdict structure as follows: 我正在处理大量网址(根据时间表),并将它们归类为嵌套的defaultdict结构,如下所示:

My categories are: 我的类别是:

  • Options: 3 possibilities 选项:3种可能性
  • Quarter: 4 possibilities 季度:4种可能性
  • Weeks: 52 possibilities 周:52种可能性

The value for weeks should be a list. 几周的值应该是一个列表。

My code 我的密码

def setup_urls(option):

    urls = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))

    for quarter in range(1, 5):
        for week in range( 1, 53):
        // logic of computing url goes here
        links[option][quarter][week].append(url)

    return urls

Output 输出量

A list with multiple defaultdict in it.

[defaultdict( < function setup_urls. < locals > . < lambda > at 0x7fd30ed2d488 > , {
    'option': defaultdict( < function setup_urls. < locals > . < lambda > . < locals > . < lambda > at 0x7fd3122a2158 > , {
        1: defaultdict( < class 'list' > , {
            45: [ url1, url2, url3, url4, url5 ]
        }),
    })
})]

I didn't want to use standard dictionaries because of efficieny with large datasets. 由于高效地处理大型数据集,我不想使用标准词典。 I have to store around 5000-10000 urls. 我必须存储大约5000-10000个网址。 In the future this could be around 100000. 将来可能会达到100000。

With some research of my own the usage of defaultdict should be good for performance, but the usage of lambda's doesn't seem to be very Pythonic. 通过我自己的一些研究,可以使用defaultdict来提高性能,但是使用lambda的用法似乎并不是Python风格。 Not sure if there are better solutions, but it's not my main question. 不知道是否有更好的解决方案,但这不是我的主要问题。

I currently how this code to access all urls, but it feels to like a lot of dirty code and specially not very Pythonic at all. 我目前正在使用此代码访问所有url,但是感觉像是很脏的代码,特别是根本不是Pythonic。

    for dict in result:
        for quarter in dict.values():
            for week in quarter.values():
                for url in week.values():
                    print(url)

I'd like to know what the better way is to access these urls in order to make use of the map-function? 我想知道哪种更好的方法是访问这些URL以利用map函数? (And is this the best way to store the urls?) (这是存储网址的最佳方法吗?)

You can structure your logic for an arbitrary level of nested dictionaries via a recursive function. 您可以通过递归函数为任意级别的嵌套字典构建逻辑。 Below is an example using itertools.chain . 以下是使用itertools.chain的示例。

from collections import defaultdict
from itertools import chain

def get_values(d, res=[]):
    for k, v in d.items():
        if isinstance(v, dict):
            get_values(v, res=res)
        else:
            res.append(v)
    return list(chain.from_iterable(res))

d = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))

d[1][2][3].append(343)
d[1][2][3].append(1245)
d[1][2][4].append(563)
d[1][2][4].append(763)

res = list(get_values(d))
# [343, 1245, 563, 763]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM