简体繁体中英

NLP using replacement tokens

原文 2019-11-11 18:26:31 6 1 python/ nlp/ text-classification

I read a lot of articles that deal with different NLP classification tasks and I saw that most of them specify in the pre-processing section that they use replacement tokens:

eg We removed and replaced the URLs, emojis and punctuation with replacement tokens: <URL>, <EMOJI>, <PUNCT> .

I am quite new to this domain and I was wondering if there is some special way to deal with this kind of tokens/tags? Is it necessary to use < > or is this just a way to signal this replacement and for helping the classifier in finding a pattern?

Any help would be greatly appreciated.

1 answers

From what I did, in the pre-processing section, people replace all tokens (chars, morphemes, words) with numbers. These replacement tokens are nothing but numbers as well, <URL> is just a way to present it to humans.

How to define tokens in spaCy NLP in Python?

Getting “doc2bow expects an array of unicode tokens on input, not a single string” as a try to do nlp using gensim" Is there a solution?

Identifying text using NLP

NLP using XLM dataset

NLP reverse tokenizing (going from tokens to nicely formatted sentence)

NLP nltk using the custom grammar

How to tokenize sentence using nlp

Using NLP to link the subjects in sentences

Using GraalPython as a Jython replacement

Channel Replacement Using PIL

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to define tokens in spaCy NLP in Python? Getting “doc2bow expects an array of unicode tokens on input, not a single string” as a try to do nlp using gensim" Is there a solution? Identifying text using NLP NLP using XLM dataset NLP reverse tokenizing (going from tokens to nicely formatted sentence) NLP nltk using the custom grammar How to tokenize sentence using nlp Using NLP to link the subjects in sentences Using GraalPython as a Jython replacement Channel Replacement Using PIL

Related Tags

NLP using replacement tokens

Question

1 answers

solution1 1 2019-11-11 19:06:44

solution1
1 2019-11-11 19:06:44