简体   繁体   English

'generator' 类型的对象没有 len()

[英]object of type 'generator' has no len()

I have just started to learn python.我刚刚开始学习python。 I want to write a program in NLTK that breaks a text into unigrams, bigrams.我想用 NLTK 编写一个程序,将文本分解为 unigrams、bigrams。 For example if the input text is...例如,如果输入文本是...

"I am feeling sad and disappointed due to errors"

... my function should generate text like: ...我的函数应该生成如下文本:

I am-->am feeling-->feeling sad-->sad and-->and disappointed-->disppointed due-->due to-->to errors

I have written code to input text into the program.我已经编写了将文本输入到程序中的代码。 Here's the function I'm trying:这是我正在尝试的功能:

def gen_bigrams(text):
    token = nltk.word_tokenize(review)
    bigrams = ngrams(token, 2)
    #print Counter(bigrams)
    bigram_list = ""
    for x in range(0, len(bigrams)):
        words = bigrams[x]
        bigram_list = bigram_list + words[0]+ " " + words[1]+"-->"
    return bigram_list

The error I'm getting is...我得到的错误是...

for x in range(0, len(bigrams)):

TypeError: object of type 'generator' has no len()

As the ngrams function returns a generator, I tried using len(list(bigrams)) but it returns 0 value, so I'm getting the same error.由于ngrams函数返回一个生成器,我尝试使用len(list(bigrams))但它返回 0 值,所以我得到了同样的错误。 I have referred to other questions on StackExchange but I am still not getting around how to resolve this.我已经提到了 StackExchange 上的其他问题,但我仍然没有解决如何解决这个问题。 I am stuck at this error.我被这个错误困住了。 Any workaround, suggestion?任何解决方法,建议?

Constructing strings by concatenating values separated by a separator is best done by str.join :通过连接由分隔符分隔的值来构造字符串最好由str.join完成:

def gen_bigrams(text):
    token = nltk.word_tokenize(text)
    bigrams = nltk.ngrams(token, 2)
    # instead of " ".join also "{} {}".format would work in the map
    return "-->".join(map(" ".join, bigrams))

Note that there'll be no trailing "-->", so add that, if it's necessary.请注意,不会有尾随的“-->”,因此如有必要,请添加它。 This way you don't even have to think about the length of the iterable you're using.这样你甚至不必考虑你正在使用的迭代的长度。 In general in python that is almost always the case.一般来说,在python中几乎总是如此。 If you want to iterate through an iterable, use for x in iterable: .如果要遍历一个可迭代对象,请使用for x in iterable: If you do need the indexes, use enumerate :如果确实需要索引,请使用enumerate

for i, x in enumerate(iterable):
    ...

bigrams is a generator function and bigrams.next() is what gives you the tuple of your tokens. bigrams 是一个生成器函数,而 bigrams.next() 是为您提供令牌元组的东西。 You can do len() on bigrams.next() but not on the generator function.您可以在 bigrams.next() 上执行 len() 但不能在生成器函数上执行。 Following is more sophisticated code to do what you are trying to achieve.以下是更复杂的代码来完成您想要实现的目标。

>>> review = "i am feeling sad and disappointed due to errors"
>>> token = nltk.word_tokenize(review)
>>> bigrams = nltk.ngrams(token, 2)
>>> output = ""
>>> try:
...   while True:
...     temp = bigrams.next()
...     output += "%s %s-->" % (temp[0], temp[1])
... except StopIteration:
...   pass
... 
>>> output
'i am-->am feeling-->feeling sad-->sad and-->and disappointed-->disappointed due-->due to-->to errors-->'
>>> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM