简体   繁体   中英

python set constructor to generator object

I am trying to apply a set constructor to a generator object, but it gives an error saying: Expected string or buffer. However, if I convert it into a list and then apply the set constructor, it does not give any error. But I am unable to view my list items and the length is displayed as 1 even though I am using multiple sentences. I am unable to completely understand the working. Any explanations would be appreciated! Thanks! Code as given below:

train = [({'I love this sandwich.'}, 'pos'), ({'This is an amazing place!'}, 'pos'),
({'I feel very good about these beers.'}, 'pos'), ({'This is my best work.'}, 'pos'),
({"What an awesome view"}, 'pos'),({'I do not like this restaurant'}, 'neg'),
({'I am tired of this stuff.'}, 'neg'), ({"I can't deal with this"}, 'neg'),
({'He is my sworn enemy!'}, 'neg'), ({'My boss is horrible.'}, 'neg')]
all_words =(word.lower() for passage in train for word in word_tokenize(passage[0]))
print type(all_words)
all_words = set(all_words)
t= [({word: (word in word_tokenize(x[0])) for word in all_words}, x[1]) for x in train]

The error I get is TypeError: Expected string or buffer. The traceback is as given below:

Traceback (most recent call last) :

File "C:/Users/5460/Desktop/train0501.py", line 18, in <module>
    all_words = set(all_words)
  File "C:/Users/5460/Desktop/train0501.py", line 15, in <genexpr>
    all_words = (word.lower() for passage in train for word in word_tokenize(passage[0]))
  File "C:\Python27\lib\site-packages\nltk\tokenize\__init__.py", line 87, in word_tokenize
    return _word_tokenize(text)
  File "C:\Python27\lib\site-packages\nltk\tokenize\treebank.py", line 67, in tokenize
    text = re.sub(r'^\"', r'``', text)
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  TypeError: expected string or buffer

In your code passage[0] is a something like {'I love this sandwich.'} which is a set (that's what the { ... } does). Your word_tokenize function does not work with sets and thus throws the error.

You should simply leave your sentences intact:

train = [('I love this sandwich.', 'pos'), ...]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM