简体   繁体   中英

Finding the Head Word in Python

I need to extract the head words of sentences (more specifically, the head words of the highest noun phrase in a sentence). I am using the Stanford CoreNLP server through py-corenlp to annotate my sentences. The suite has a modification of Michael Collin's head word finding algorithm, but I have not found any method to use it through the server. I would like to avoid reinventing the wheel, so is there any way I can achieve this with existing tools in Python?

Example:

The number of elementary entities in 1 mole of a substance is known as what?

(ROOT
  (S
    (NP
      (NP (DT The) (NN number))
      (PP (IN of)
        (NP
          (NP (JJ elementary) (NNS entities))
          (PP (IN in)
            (NP
              (NP (CD 1) (NN mole))
              (PP (IN of)
                (NP (DT a) (NN substance))))))))
    (VP (VBZ is)
      (VP (VBN known)
        (PP (IN as)
          (NP (WP what)))))
    (. ?)))

"The number of elementary entities in 1 mole of a substance" is the highest noun phrase.

"number" is the head word of the phrase, which I want to extract.


EDIT: Added example.

Looks like it may be easier using the typed dependencies instead of the syntactic parse. Your sentence will be ROOTed with a verb, then find the dependency nsubj or nsubjpas for that verb. For example:

root ( ROOT-0 , known-13 ) <- Start with this one
det ( number-2 , The-1 )
nsubjpass ( known-13 , number-2 ) <- Then this one
case ( entities-5 , of-3 )
amod ( entities-5 , elementary-4 )
nmod ( number-2 , entities-5 )
case ( mole-8 , in-6 )
nummod ( mole-8 , 1-7 )
nmod ( entities-5 , mole-8 )
case ( substance-11 , of-9 )
det ( substance-11 , a-10 )
nmod ( mole-8 , substance-11 )
auxpass ( known-13 , is-12 )
case ( what-15 , as-14 )
nmod ( known-13 , what-15 )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM