从python中的句子中提取子句

Question

I have to list out clauses from given sentences. 我必须列出给定句子中的子句。 I am implementing my own grammar rules to parse out clauses from sentences. 我正在实现自己的语法规则，以解析句子中的从句。 The result I obtained is: 我得到的结果是：

*************************************************
(S
  (CLAUSE
    (VP
      (VP they/PRP were/VBD delivered/VBN promptly/RB)
      and/CC
      (VP a/DT very/RB))
    (NP (NP good/JJ value/NN) and/CC (NP excellent/NN)))
  (CLAUSE
    (VP all/DT)
    (NP (NP around/IN (NP slipper/NN)) (NP with/IN (NP traction/NN))))
  ./.)
*************************************************

From above result, clauses should be listed out, to give the result in the following statements. 从上面的结果中，应该列出子句，以便在以下语句中给出结果。

they were delivered promptly and a very good value and excellent

all around slipper with traction.

I've tried using flatten and chomsky_normal_form but couldn't get the desired result. 我尝试使用flatten和chomsky_normal_form但无法获得所需的结果。 How to list out each clauses on single line getting rid of tags? 如何在单行上列出每个子句以摆脱标记？

Answer 1

Since all you want to extract from your string s seems to be lowercase, you can apply one of the following one-liners: 由于您要从字符串s提取的所有内容似乎都是小写字母，因此可以应用以下一种格式：

Python list comprehension Python列表理解

print ' '.join(''.join(c for c in s if 'a' <= c <= 'z' or c == ' ').split())

It joins ( ''.join ) all characters that are between "a" and "z" or " ". 它将（a。）和“ z”或“”之间的所有字符连接起来（ ''.join ）。 To suppress multiple spaces next to each other it splits the result and joins it again with a space as separator. 为了消除相邻的多个空格，它将结果分割并再次以空格作为分隔符将其合并。

Regular expression 正则表达式

If you prefer regular expressions ( import re ), this even shorter statement yields the same result: 如果您更喜欢正则表达式（ import re ），则此更短的语句将产生相同的结果：

print ' '.join(re.findall('[a-z]+', s))

Edit 编辑

If you want to process each clause individually, you can split the whole string s and then apply the same code to each part (except the first one, which is just the header): 如果要单独处理每个子句，则可以拆分整个字符串s ，然后将相同的代码应用于每个部分（第一个部分除外，后者只是标题）：

for part in s.split("CLAUSE")[1:]:
    print ' '.join(re.findall('[a-z]+', part))

从python中的句子中提取子句

问题描述

1 个解决方案

解决方案1
2 2014-10-28 09:33:57

从python中的句子中提取子句

问题描述

1 个解决方案

解决方案1 2 2014-10-28 09:33:57

解决方案1
2 2014-10-28 09:33:57