简体   繁体   中英

Horizontal and Vertical Markovization

I have a sentence along with the grammar in a tree form. I need to train a Probabilistic Context Free Grammar from it so that I can give the best possible parse for it. I am using Viterbi CKY algorithm to get the best parse. The sentences are in the following tree format: (TOP (S (NP (DT The) (NN flight)) (VP (MD should) (VP (VB be) (NP (NP (CD eleven) (RB am)) (NP (NN tomorrow)))))) (PUNC .))

I have built a system which from the ATIS section of the Penn Treebank has learnt a probabilistic grammar and now can give a possible parse output for the above sentence.

I read about Horizontal and Vertical Markovization techniques which can help increase the accuracy by using annotations. I am a little confused as to how they work. Can someone guide me to some explanatory examples or illustrate how they work and how they effect the accuracy.

It is worth looking at this paper by Klein and Manning:

http://nlp.stanford.edu/~manning/papers/unlexicalized-parsing.pdf

Vertical Markovization is a technique that provides context for a given rule. From the above paper:

For example, subject NP expansions are very different from object NP expansions: a subject NP is 8.7 times more likely than an object NP to expand as just a pronoun. Having separate symbols for subject and object NPs allows this variation to be captured and used to improve parse scoring. One way of capturing this kind of external context is to use parent annotation, as presented in Johnson (1998). For example, NPs with S parents (like subjects) will be marked NPˆS, while NPs with VP parents (like objects) will be NPˆVP.

By rewriting these rules with this additional parent annotation, we are adding information about the location of the rule that you are rewriting, and this additional information provides a more accurate probability of a particular rule rewrite.

The implementation of this is quite simple. Using the training data, start at the bottom non-terminals (these are the rules that rewrite to terminals such as DT, NNP, NN, VB, etc.) and append a ^ followed by its parent non-terminal. In your example, the first rewrite would be NP^S, and so on. Continue up the tree until you have reached the TOP (which you would not rewrite). In your case, the final rewrite would be S^TOP. Stripping the tags on your output will give you the final parse tree.

As for Horizontal Markovization, see this thread for a nice discussion: Horizontal Markovization .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM