使用Counter在列表中的字典

Question

我想編寫一個函數，該函數列出所有其他詞典中出現次數至少為df次的詞典項目的計數器。

例：

prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]
prune(([{'a': 1, 'b': 10}, {'a': 2}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 2})]

正如我們所看到的，“ a”在兩個字典中出現兩次，並在輸出中列出。

我的方法：

from collections import Counter
def prune(dicto,df=2):
   new = Counter()
   for d in dicto:
       new += Counter(d.keys())
   x = {}
   for key,value in new.items():
       if value >= df:
           x[key] = value
   print Counter(x)

輸出：

Counter({'a': 2})

這給出了作為組合計數器的輸出。 如我們所見，術語“ a”在整體上出現了2次，因此它滿足df條件並在輸出中列出。 現在，任何人都可以糾正我以獲得所需的輸出。

Answer 1

我會建議：

from collections import Counter
def prune(dicto, min_df=2):
    # Create all counters
    counters = [Counter(d.keys()) for d in dicto]

    # Sum all counters
    total = sum(counters, Counter()) 

    # Create set with keys of high frequency
    keys = set(k for k, v in total.items() if v >= min_df)

    # Reconstruct counters using high frequency keys
    counters = (Counter({k: v for k, v in d.items() if k in keys}) for d in dicto)

    # With filter(None, ...) we take only the non empty counters.
    return filter(None, counters)

結果：

>>> prune(([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
[Counter({'a': 1}), Counter({'a': 1})]

Answer 2

鏈接鍵，並保留滿足條件的每個字典中的鍵。

from itertools import chain

def prune(l, min_df=0):
    # count how many times every key appears
    count = Counter(chain.from_iterable(l))
    # create Counter dicts using keys that appear at least  min_df times
    return filter(None,(Counter(k for k in d if count.get(k) >= min_df) for d in l))

In [14]: prune([{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}], min_df=2)
Out[14]: [Counter({'a': 1}), Counter({'a': 1})]

您可以避免使用過濾器，但是我不確定它會更有效：

def prune(l, min_df=0):
        count = Counter(chain.from_iterable(l))
        res = []
        for d in l:
            cn = Counter(k for k in d if count.get(k) >= min_df)
            if cn:
                res.append(cn)
        return res

循環幾乎可以相提並論：

In [31]: d = [{'a': 1, 'b': 10}, {'a': 1}, {'c': 1}]    
In [32]: d = [choice(d) for _ in range(1000)]   
In [33]: timeit chain_prune_loop(d, min_df=2)
100 loops, best of 3: 5.49 ms per loop    
In [34]: timeit prune(d, min_df=2)
100 loops, best of 3: 11.5 ms per loop
In [35]: timeit set_prune(d, min_df=2)
100 loops, best of 3: 13.5 ms per loop

Answer 3

這將打印出至少出現在df詞典中的每個鍵的所有值。

def prune(dicts, df):
    counts = {}
    for d in dicts:  # for each dictionary
        for k,v in d.items():  # for each key,value pair in the dictionary
            if k not in counts:  # if we haven't seen this key before
                counts[k] = []
            counts[k].append(v)  # append this value to this key

    for k,vals in counts.items():
        if len(vals) < df:
            continue  # take only the keys that have at least `df` values (that appear in at least `df` dictionaries)
        for val in vals:
            print(k, ":", val)

使用Counter在列表中的字典

問題描述

3 個解決方案

解決方案1
5 已采納 2015-04-14 22:43:36

解決方案2
1 2015-04-14 23:27:31

解決方案3
0 2015-04-14 22:43:24

使用Counter在列表中的字典

問題描述

3 個解決方案

解決方案1 5 已采納 2015-04-14 22:43:36

解決方案2 1 2015-04-14 23:27:31

解決方案3 0 2015-04-14 22:43:24

解決方案1
5 已采納 2015-04-14 22:43:36

解決方案2
1 2015-04-14 23:27:31

解決方案3
0 2015-04-14 22:43:24