为Python中的Counter过滤了most_common（）

Question

I have a counter formatted as {(f1, f2): counts}. 我有一个格式为{（f1，f2）：counts}的计数器。 When I run the Counter.most_common() on this I get correct results but I want to filter most_common() for some filter on f2. 当我在此运行Counter.most_common（）时，我得到了正确的结果，但是我想为f2上的某些过滤器过滤most_common（）。 For example f2 = 'A' should return most_common elements having f2 = 'A'. 例如，f2 ='A'应该返回具有f2 ='A'的most_common元素。 How to do this? 这个怎么做？

Answer 1

If we look at the source code for Counter we see that it uses heapq to remain O(n + k log n) where k is the number of keys wanted and n is the size of the Counter , as opposed to O(n log n) . 如果我们查看Counter的源代码，我们会看到它使用heapq保持O(n + k log n) ，其中k是所需键的数量， n是Counter的大小，与O(n log n) 。

def most_common(self, n=None):
    '''List the n most common elements and their counts from the most
    common to the least.  If n is None, then list all element counts.

    >>> Counter('abcdeabcdabcaba').most_common(3)
    [('a', 5), ('b', 4), ('c', 3)]

    '''
    # Emulate Bag.sortedByCount from Smalltalk
    if n is None:
        return sorted(self.items(), key=_itemgetter(1), reverse=True)
    return _heapq.nlargest(n, self.items(), key=_itemgetter(1))

Because this is more than O(n) , we can just filter the counter and get its items: 因为它大于O(n) ，所以我们可以过滤计数器并获取其项目：

counts = Counter([(1, "A"), (2, "A"), (1, "A"), (2, "B"), (1, "B")])

Counter({(f1, f2): n for (f1, f2), n in counts.items() if f2 == "A"}).most_common(2)
#>>> [((1, 'A'), 2), ((2, 'A'), 1)]

Although unwrapping it may make it slightly faster, if that matters: 尽管解开包包可能会使它稍微快一些，但是这很重要：

import heapq
from operator import itemgetter

filtered = [((f1, f2), n) for (f1, f2), n in counts.items() if f2 == "A"]
heapq.nlargest(2, filtered, key=itemgetter(1))
#>>> [((1, 'A'), 2), ((2, 'A'), 1)]

为Python中的Counter过滤了most_common（）

问题描述

1 个解决方案

解决方案1
0 2014-09-20 02:04:54

为Python中的Counter过滤了most_common（）

问题描述

1 个解决方案

解决方案1 0 2014-09-20 02:04:54

解决方案1
0 2014-09-20 02:04:54