[英]accessing elements of a counter containing ngrams
I am taking a string, tokenizing it, and want to look at the most common bigrams, here is what I have got: 我正在取一个字符串,将其标记化,并想看看最常见的二元组,这是我得到的:
import nltk
import collections
from nltk import ngrams
someString="this is some text. this is some more test. this is even more text."
tokens=nltk.word_tokenize(someString)
tokens=[token.lower() for token in tokens if len()>1]
bigram=ngrams(tokens,2)
aCounter=collections.Counter(bigram)
If I: 如果我:
print(aCounter)
Then it will output the bigrams in sorted order. 然后它将按排序顺序输出二元组。
for element in aCounter:
print(element)
Will print the elements, but not with a count, and not in order of the count. 将打印元素,但不打印计数,也不打印计数顺序。 I want to do a for loop, where I print out the X most common bigrams in a text. 我想做一个for循环,在这里我在文本中打印出X个最常见的双字母组。
I am essentially trying to learn both Python and nltk at the same time, so this could be why I am struggling here (I assume this is a trivial thing). 我本质上是在尝试同时学习Python和nltk,所以这可能就是为什么我在这里努力的原因(我认为这是一件微不足道的事情)。
You're probably looking for something that already exists, namely, the most_common
method on counters. 您可能正在寻找已经存在的东西,即计数器上的most_common
方法。 From the docs: 从文档:
Return a list of the
n
most common elements and their counts from the most common to the least. 返回n
最常见元素的列表及其从最常见到最小的计数。 Ifn
is omitted orNone
,most_common()
returns all elements in the counter. 如果省略n
或None
,则most_common()
返回计数器中的所有元素。 Elements with equal counts are ordered arbitrarily: 相等计数的元素可以任意排序:
You can call it and supply a value n
in order to get the n
most common value-count pairs. 您可以调用它并提供一个值n
以获得n
最常见的值-计数对。 For example: 例如:
from collections import Counter
# initialize with silly value.
c = Counter('aabbbccccdddeeeeefffffffghhhhiiiiiii')
# Print 4 most common values and their respective count.
for val, count in c.most_common(4):
print("Value {0} -> Count {1}".format(val, count))
Which prints out: 打印出:
Value f -> Count 7
Value i -> Count 7
Value e -> Count 5
Value h -> Count 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.