简体   繁体   English

附加到具有字典理解的列表字典

[英]Append to a dict of lists with a dict comprehension

Suppose I have a large list of words.假设我有一个很大的单词列表。 For an example:例如:

>>> with open('/usr/share/dict/words') as f:
...     words=[word for word in f.read().split('\n') if word]

If I wanted to build an index by first letter of this word list, this is easy:如果我想通过这个单词列表的第一个字母建立一个索引,这很容易:

d={}
for word in words:
   if word[0].lower() in 'aeiou':
       d.setdefault(word[0].lower(),[]).append(word)
       # You could use defaultdict here too...

Results in something like this:结果是这样的:

{'a':[list of 'a' words], 'e':[list of 'e' words], 'i': etc...}

Is there a way to do this with Python 2.7, 3+ dict comprehension?有没有办法用 Python 2.7, 3+ dict 理解来做到这一点? In other words, is it possible with the dict comprehension syntax to append the list represented by the key as the dict is being built?换句话说,是否可以使用 dict 理解语法在构建 dict 时附加由键表示的列表?

ie: IE:

  index={k[0].lower():XXX for k in words if k[0].lower() in 'aeiou'}

Where XXX performs an append operation or list creation for the key as index is being created.其中 XXX 在创建index为键执行追加操作或列表创建。

Edit编辑

Taking the suggestions and benchmarking:采纳建议和基准:

def f1():   
    d={}
    for word in words:
        c=word[0].lower()
        if c in 'aeiou':
           d.setdefault(c,[]).append(word)

def f2():
   d={}
   {d.setdefault(word[0].lower(),[]).append(word) for word in words 
        if word[0].lower() in 'aeiou'} 

def f3():
    d=defaultdict(list)                       
    {d[word[0].lower()].append(word) for word in words 
            if word[0].lower() in 'aeiou'}         

def f4():
    d=functools.reduce(lambda d, w: d.setdefault(w[0], []).append(w[1]) or d,
       ((w[0].lower(), w) for w in words
        if w[0].lower() in 'aeiou'), {}) 

def f5():   
    d=defaultdict(list)
    for word in words:
        c=word[0].lower() 
        if c in 'aeiou':
            d[c].append(word)       

Produces this benchmark:产生这个基准:

   rate/sec    f4     f2     f1     f3     f5
f4       11    -- -21.8% -31.1% -31.2% -41.2%
f2       14 27.8%     -- -11.9% -12.1% -24.8%
f1       16 45.1%  13.5%     --  -0.2% -14.7%
f3       16 45.4%  13.8%   0.2%     -- -14.5%
f5       18 70.0%  33.0%  17.2%  16.9%     --

The straight loop with a default dict is fastest followed by set comprehension and loop with setdefault .带有默认 dict 的直接循环最快,其次是 set comprehension 和带有setdefault循环。

Thanks for the ideas!谢谢你的想法!

No - dict comprehensions are designed to generate non-overlapping keys with each iteration; No - dict 推导式旨在在每次迭代中生成不重叠的键; they don't support aggregation.他们不支持聚合。 For this particular use case, a loop is the proper way to accomplish the task efficiently (in linear time).对于此特定用例,循环是有效(在线性时间内)完成任务的正确方法。

It is not possible (at least easily or directly) with a dict comprehension.字典理解是不可能的(至少很容易或直接)。

It is possible, but potentially abusive of the syntax, with a set or list comprehension:这是可能的,但可能会滥用语法,使用集合或列表理解:

# your code:    
d={}
for word in words:
   if word[0].lower() in 'aeiou':
       d.setdefault(word[0].lower(),[]).append(word)        

# a side effect set comprehension:  
index={}   
r={index.setdefault(word[0].lower(),[]).append(word) for word in words 
        if word[0].lower() in 'aeiou'}     

print r
print [(k, len(d[k])) for k in sorted(d.keys())]  
print [(k, len(index[k])) for k in sorted(index.keys())]

Prints:印刷:

set([None])
[('a', 17094), ('e', 8734), ('i', 8797), ('o', 7847), ('u', 16385)]
[('a', 17094), ('e', 8734), ('i', 8797), ('o', 7847), ('u', 16385)]

The set comprehension produces a set with the results of the setdefault() method after iterating over the words list.集合setdefault()在迭代words列表后生成一个带有setdefault()方法结果的集合。 The sum total of set([None]) in this case.在这种情况下set([None])的总和。 It also produces your desired side effect of producing your dict of lists.它还会产生您想要的生成列表字典的副作用。

It is not as readable (IMHO) as the straight looping construct and should be avoided (IMHO).它不像直循环结构那样可读(恕我直言),应该避免(恕我直言)。 It is no shorter and probably not materially faster.它不会更短,也可能不会显着更快。 This is more interesting trivia about Python than useful -- IMHO... Maybe to win a bet?这是关于 Python 的有趣琐事而不是有用——恕我直言......也许是为了赢得赌注?

I'd use filter :我会使用filter

>>> words = ['abcd', 'abdef', 'eft', 'egg', 'uck', 'ice']
>>> index = {k.lower() : list(filter(lambda x:x[0].lower() == k.lower(),words)) for k in 'aeiou'}
>>> index
{'a': ['abcd', 'abdef'], 'i': ['ice'], 'e': ['eft', 'egg'], 'u': ['uck'], 'o': []}

This is not exactly a dict comprehension, but:这不完全是对字典的理解,而是:

reduce(lambda d, w: d.setdefault(w[0], []).append(w[1]) or d,
       ((w[0].lower(), w) for w in words
        if w[0].lower() in 'aeiou'), {})

Not answering the question of a dict comprehension, but it might help someone searching this problem.不回答字典理解的问题,但它可能有助于搜索这个问题的人。 In a reduced example, when filling growing lists on the run into a new dictionary, consider calling a function in a list comprehension, which is, admittedly, nothing better than a loop.在简化的示例中,当将不断增长的列表填充到新字典中时,请考虑在列表推导式中调用一个函数,诚然,没有什么比循环更好的了。

def fill_lists_per_dict_keys(k, v):
    d[k] = (
        v
        if k not in d 
        else d[k] + v
    )

# global d
d = {}
out = [fill_lists_per_dict_keys(i[0], [i[1]]) for i in d2.items()]

The out is only to suppress the None Output of each loop. out只是为了抑制每个循环的None输出。

If you ever want to use the new dictionary even inside the list comprehension at runtime or if you run into another reason why your dictionary gets overwritten by each loop, check to make it global with global d at the beginning of the script (commented out because not necessary here).如果您想在运行时甚至在列表理解中使用新字典,或者如果您遇到字典被每个循环覆盖的另一个原因,请检查以在脚本开头使用global d使其全局化(注释掉是因为这里不需要)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM