从斯坦福分析器的无上下文短语结构输出中提取信息

Question

The Stanford Parser (http://nlp.stanford.edu/software/lex-parser.shtml) gives context-free phrase structure trees as following. Stanford Parser（http://nlp.stanford.edu/software/lex-parser.shtml）给出了无上下文的短语结构树，如下所示。 What is the best way to extract things like all the Noun Phrases(NP) and Verb Phrases(NP) in the tree? 提取树中所有名词短语（NP）和动词短语（NP）的最佳方法是什么？ Is there any Python (or Java) library that can allow me to read structures like these? 是否有任何Python（或Java）库可以让我读取这些结构？ Thank you. 谢谢。

(ROOT
  (S
    (S
      (NP
        (NP (DT The) (JJS strongest) (NN rain))
        (VP
          (ADVP (RB ever))
          (VBN recorded)
          (PP (IN in)
            (NP (NNP India)))))
      (VP
        (VP (VBD shut)
          (PRT (RP down))
          (NP
            (NP (DT the) (JJ financial) (NN hub))
            (PP (IN of)
              (NP (NNP Mumbai)))))
        (, ,)
        (VP (VBD snapped)
          (NP (NN communication) (NNS lines)))
        (, ,)
        (VP (VBD closed)
          (NP (NNS airports)))
        (CC and)
        (VP (VBD forced)
          (NP
            (NP (NNS thousands))
            (PP (IN of)
              (NP (NNS people))))
          (S
            (VP (TO to)
              (VP
                (VP (VB sleep)
                  (PP (IN in)
                    (NP (PRP$ their) (NNS offices))))
                (CC or)
                (VP (VB walk)
                  (NP (NN home))
                  (PP (IN during)
                    (NP (DT the) (NN night))))))))))
    (, ,)
    (NP (NNS officials))
    (VP (VBD said)
      (NP-TMP (NN today)))
    (. .)))

Answer 1

Check out the Natural Language Toolkit (NLTK) at nltk.org . 查看nltk.org上的Natural Language Toolkit（NLTK）。

The toolkit is written in Python and provides code for reading precisely these kinds of trees (as well as lots of other stuff). 该工具包是用Python编写的，它提供了精确读取这些树（以及许多其他东西）的代码。

Alternatively, you could write your own recursive function for doing this. 或者，您可以编写自己的递归函数来执行此操作。 It would be pretty straightforward. 这将非常简单。

Just for fun: here's a super simple implementation of what you want: 只是为了好玩：这是一个超级简单的实现你想要的：

def parse():
  itr = iter(filter(lambda x: x, re.split("\\s+", s.replace('(', ' ( ').replace(')', ' ) '))))

  def _parse():
    stuff = []
    for x in itr:
      if x == ')':
        return stuff
      elif x == '(':
        stuff.append(_parse())
      else:
        stuff.append(x)
    return stuff

  return _parse()[0]

def find(parsed, tag):
  if parsed[0] == tag:
    yield parsed
  for x in parsed[1:]:
    for y in find(x, tag):
      yield y

p = parse()
np = find(p, 'NP')
for x in np:
  print x

yields: 收益率：

['NP', ['NP', ['DT', 'The'], ['JJS', 'strongest'], ['NN', 'rain']], ['VP', ['ADVP', ['RB', 'ever']], ['VBN', 'recorded'], ['PP', ['IN', 'in'], ['NP', ['NNP', 'India']]]]]
['NP', ['DT', 'The'], ['JJS', 'strongest'], ['NN', 'rain']]
['NP', ['NNP', 'India']]
['NP', ['NP', ['DT', 'the'], ['JJ', 'financial'], ['NN', 'hub']], ['PP', ['IN', 'of' ['NP', ['NNP', 'Mumbai']]]]
['NP', ['DT', 'the'], ['JJ', 'financial'], ['NN', 'hub']]
['NP', ['NNP', 'Mumbai']]
['NP', ['NN', 'communication'], ['NNS', 'lines']]
['NP', ['NNS', 'airports']]
['NP', ['NP', ['NNS', 'thousands']], ['PP', ['IN', 'of'], ['NP', ['NNS', 'people']]]]
['NP', ['NNS', 'thousands']]
['NP', ['NNS', 'people']]
['NP', ['PRP$', 'their'], ['NNS', 'offices']]
['NP', ['NN', 'home']]
['NP', ['DT', 'the'], ['NN', 'night']]
['NP', ['NNS', 'officials']]

从斯坦福分析器的无上下文短语结构输出中提取信息

问题描述

1 个解决方案

解决方案1
2 已采纳 2012-03-21 03:51:51

从斯坦福分析器的无上下文短语结构输出中提取信息

问题描述

1 个解决方案

解决方案1 2 已采纳 2012-03-21 03:51:51

解决方案1
2 已采纳 2012-03-21 03:51:51