简体   繁体   English

解析和词性标注有什么区别?

[英]What is the difference between parsing and Part Of Speech Tagging?

I know that POS tagging labels each and every word in a sentence with its appropriate Part Of Speech, But isn't that what a Parser does too?我知道 POS 标记用适当的词性标记句子中的每个单词,但是解析器不也是这样做的吗? ie, break a sentence into its component parts?即,将一个句子分成其组成部分? I've looked this up on the internet but couldn't find any satisfactory explanation.我在网上查到了这个,但找不到任何令人满意的解释。 Please clear my doubt.请清除我的疑问。 Thanks in advance提前致谢

They are two distinct procedures:它们是两个不同的过程:

  • POS Tagging: each token gets assigned a label which reflects its word class. POS 标记:每个令牌都被分配了一个 label,它反映了它的单词 class。

  • Parsing: each sentence gets assigned a structure (often a tree) which reflects how its components are related to each other.解析:每个句子都被分配一个结构(通常是一棵树),它反映了它的组成部分是如何相互关联的。

POS Tagging takes a tokenised sequence of words, and returns a list of annotated tokens, where each token has a word class label. POS 标记采用标记化的单词序列,并返回带注释的标记列表,其中每个标记都有一个单词 class label。 This is often disambiguated by looking at the context surrounding the token.这通常通过查看令牌周围的上下文来消除歧义。

There is also chunking , which groups tokens into related groups (such as noun phrases).还有chunking ,它将标记分组到相关的组中(例如名词短语)。 Chunks are non-overlapping sequences.块是不重叠的序列。

Parsing commonly results in a parse tree for a sentence;解析通常会产生一个句子的解析树 often there can be many possible trees in case of ambiguous sentences.在模棱两可的句子的情况下,通常可能有许多可能的树。

POS tagging is usually a preparatory step in parsing, as a parser typically operates on word classes (though there are some parsing algorithms that work with tokens directly, or a mixture of tags and tokens).词性标注通常是解析中的一个准备步骤,因为解析器通常对词类进行操作(尽管有一些解析算法可以直接使用标记,或者混合使用标记和标记)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM