简体繁体 English

NLP 使用替换令牌

[英]NLP using replacement tokens

原文 2019-11-11 18:26:31 2 1 python/ nlp/ text-classification

I read a lot of articles that deal with different NLP classification tasks and I saw that most of them specify in the pre-processing section that they use replacement tokens:我阅读了很多处理不同 NLP 分类任务的文章，我看到其中大多数在预处理部分指定他们使用替换标记：

eg We removed and replaced the URLs, emojis and punctuation with replacement tokens: <URL>, <EMOJI>, <PUNCT> .例如，我们删除并用替换标记替换了 URL、表情符号和标点符号： <URL>, <EMOJI>, <PUNCT> 。

I am quite new to this domain and I was wondering if there is some special way to deal with this kind of tokens/tags?我对这个领域很陌生，我想知道是否有一些特殊的方法来处理这种令牌/标签？ Is it necessary to use < > or is this just a way to signal this replacement and for helping the classifier in finding a pattern?是否有必要使用< >或者这只是表示这种替换并帮助分类器找到模式的一种方式？

Any help would be greatly appreciated.任何帮助将不胜感激。

1 个解决方案

From what I did, in the pre-processing section, people replace all tokens (chars, morphemes, words) with numbers.根据我所做的，在预处理部分，人们用数字替换所有标记（字符、词素、单词）。 These replacement tokens are nothing but numbers as well, <URL> is just a way to present it to humans.这些替换标记也不过是数字， <URL>只是将其呈现给人类的一种方式。

如何在Python的spaCy NLP中定义令牌？ - How to define tokens in spaCy NLP in Python?

获取“doc2bow 需要输入 unicode 令牌数组，而不是单个字符串”作为尝试使用 gensim 执行 nlp 有解决方案吗？ - Getting “doc2bow expects an array of unicode tokens on input, not a single string” as a try to do nlp using gensim" Is there a solution?

使用NLP识别文本 - Identifying text using NLP

使用 XLM 数据集的 NLP - NLP using XLM dataset

NLP 反向标记化（从标记到格式良好的句子） - NLP reverse tokenizing (going from tokens to nicely formatted sentence)

NLP nltk使用自定义语法 - NLP nltk using the custom grammar

如何使用nlp标记句子 - How to tokenize sentence using nlp

使用NLP链接句子中的主题 - Using NLP to link the subjects in sentences

使用GraalPython作为Jython替代品 - Using GraalPython as a Jython replacement

使用 PIL 替换通道 - Channel Replacement Using PIL

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Python的spaCy NLP中定义令牌？ - How to define tokens in spaCy NLP in Python? 获取“doc2bow 需要输入 unicode 令牌数组，而不是单个字符串”作为尝试使用 gensim 执行 nlp 有解决方案吗？ - Getting “doc2bow expects an array of unicode tokens on input, not a single string” as a try to do nlp using gensim" Is there a solution? 使用NLP识别文本 - Identifying text using NLP 使用 XLM 数据集的 NLP - NLP using XLM dataset NLP 反向标记化（从标记到格式良好的句子） - NLP reverse tokenizing (going from tokens to nicely formatted sentence) NLP nltk使用自定义语法 - NLP nltk using the custom grammar 如何使用nlp标记句子 - How to tokenize sentence using nlp 使用NLP链接句子中的主题 - Using NLP to link the subjects in sentences 使用GraalPython作为Jython替代品 - Using GraalPython as a Jython replacement 使用 PIL 替换通道 - Channel Replacement Using PIL

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM