[英]Filtered most_common() for Counter in Python
I have a counter formatted as {(f1, f2): counts}. 我有一个格式为{(f1,f2):counts}的计数器。 When I run the Counter.most_common() on this I get correct results but I want to filter most_common() for some filter on f2. 当我在此运行Counter.most_common()时,我得到了正确的结果,但是我想为f2上的某些过滤器过滤most_common()。 For example f2 = 'A' should return most_common elements having f2 = 'A'. 例如,f2 ='A'应该返回具有f2 ='A'的most_common元素。 How to do this? 这个怎么做?
If we look at the source code for Counter
we see that it uses heapq
to remain O(n + k log n)
where k
is the number of keys wanted and n
is the size of the Counter
, as opposed to O(n log n)
. 如果我们查看Counter
的源代码,我们会看到它使用heapq
保持O(n + k log n)
,其中k
是所需键的数量, n
是Counter
的大小,与O(n log n)
。
def most_common(self, n=None):
'''List the n most common elements and their counts from the most
common to the least. If n is None, then list all element counts.
>>> Counter('abcdeabcdabcaba').most_common(3)
[('a', 5), ('b', 4), ('c', 3)]
'''
# Emulate Bag.sortedByCount from Smalltalk
if n is None:
return sorted(self.items(), key=_itemgetter(1), reverse=True)
return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
Because this is more than O(n)
, we can just filter the counter and get its items: 因为它大于O(n)
,所以我们可以过滤计数器并获取其项目:
counts = Counter([(1, "A"), (2, "A"), (1, "A"), (2, "B"), (1, "B")])
Counter({(f1, f2): n for (f1, f2), n in counts.items() if f2 == "A"}).most_common(2)
#>>> [((1, 'A'), 2), ((2, 'A'), 1)]
Although unwrapping it may make it slightly faster, if that matters: 尽管解开包包可能会使它稍微快一些,但是这很重要:
import heapq
from operator import itemgetter
filtered = [((f1, f2), n) for (f1, f2), n in counts.items() if f2 == "A"]
heapq.nlargest(2, filtered, key=itemgetter(1))
#>>> [((1, 'A'), 2), ((2, 'A'), 1)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.