简体   繁体   中英

<nltk.tokenize.casual.TweetTokenizer at 0x7f7fec4d5970> issue

it might be a basic question but I am stuck here not really sure what went wrong.

在此处输入图像描述

df['text'] contains the text data that I want to work on

    text_sents=df.text

tokens = []
for uni in text_sents:
    tok=TweetTokenizer(uni)
    tokens.append(tok)

print(tokens)

and it returns

[<nltk.tokenize.casual.TweetTokenizer object at 0x7f80216950a0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f8022278670>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7fec0bbc70>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf74970>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf747c0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf74a90>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf748b0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e520>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e070>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e0d0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f 7febf7e130>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e190>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e1c0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e250>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e2e0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e310>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e370>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e3d0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e430>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e490>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e4f0>, <nltk.tokenize.casual.TweetTokenizer ZA8CFDE6331BD59EB2AC96F8911C4B 666Z at 0x7f7febf7e5b0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e640>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e6d0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e730>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e790>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e7f0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e880>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e8b0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e5e0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e940>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf7e9d0>, <nltk.tokenize.casual.TweetTokenizer ZA8CFDE6331BD59EB2 AC96F8911C4B666Z at 0x7f7febf7ea00>...

not sure what to do with this, can it be something to do with N/A values?

TweetTokenizer() is the constructor of the TweetTokenizer class, and therefore returns a tokenizer object. You shall then call tokenizer.tokenize(sentence) :

tokenizer=TweetTokenizer() 
for uni in text_sents:
    tok = tokenizer.tokenize(uni)
    tokens.append(tok)

print(tokens)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM