简体   繁体   English

从python中的句子中提取子句

[英]Extract clauses from sentence in python

I have to list out clauses from given sentences. 我必须列出给定句子中的子句。 I am implementing my own grammar rules to parse out clauses from sentences. 我正在实现自己的语法规则,以解析句子中的从句。 The result I obtained is: 我得到的结果是:

*************************************************
(S
  (CLAUSE
    (VP
      (VP they/PRP were/VBD delivered/VBN promptly/RB)
      and/CC
      (VP a/DT very/RB))
    (NP (NP good/JJ value/NN) and/CC (NP excellent/NN)))
  (CLAUSE
    (VP all/DT)
    (NP (NP around/IN (NP slipper/NN)) (NP with/IN (NP traction/NN))))
  ./.)
*************************************************

From above result, clauses should be listed out, to give the result in the following statements. 从上面的结果中,应该列出子句,以便在以下语句中给出结果。

they were delivered promptly and a very good value and excellent

all around slipper with traction.

I've tried using flatten and chomsky_normal_form but couldn't get the desired result. 我尝试使用flattenchomsky_normal_form但无法获得所需的结果。 How to list out each clauses on single line getting rid of tags? 如何在单行上列出每个子句以摆脱标记?

Since all you want to extract from your string s seems to be lowercase, you can apply one of the following one-liners: 由于您要从字符串s提取的所有内容似乎都是小写字母,因此可以应用以下一种格式:

Python list comprehension Python列表理解

print ' '.join(''.join(c for c in s if 'a' <= c <= 'z' or c == ' ').split())

It joins ( ''.join ) all characters that are between "a" and "z" or " ". 它将(a。)和“ z”或“”之间的所有字符连接起来( ''.join )。 To suppress multiple spaces next to each other it splits the result and joins it again with a space as separator. 为了消除相邻的多个空格,它将结果分割并再次以空格作为分隔符将其合并。

Regular expression 正则表达式

If you prefer regular expressions ( import re ), this even shorter statement yields the same result: 如果您更喜欢正则表达式( import re ),则此更短的语句将产生相同的结果:

print ' '.join(re.findall('[a-z]+', s))

Edit 编辑

If you want to process each clause individually, you can split the whole string s and then apply the same code to each part (except the first one, which is just the header): 如果要单独处理每个子句,则可以拆分整个字符串s ,然后将相同的代码应用于每个部分(第一个部分除外,后者只是标题):

for part in s.split("CLAUSE")[1:]:
    print ' '.join(re.findall('[a-z]+', part))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM