[英]Extract clauses from sentence in python
I have to list out clauses from given sentences. 我必须列出给定句子中的子句。 I am implementing my own grammar rules to parse out clauses from sentences.
我正在实现自己的语法规则,以解析句子中的从句。 The result I obtained is:
我得到的结果是:
*************************************************
(S
(CLAUSE
(VP
(VP they/PRP were/VBD delivered/VBN promptly/RB)
and/CC
(VP a/DT very/RB))
(NP (NP good/JJ value/NN) and/CC (NP excellent/NN)))
(CLAUSE
(VP all/DT)
(NP (NP around/IN (NP slipper/NN)) (NP with/IN (NP traction/NN))))
./.)
*************************************************
From above result, clauses should be listed out, to give the result in the following statements. 从上面的结果中,应该列出子句,以便在以下语句中给出结果。
they were delivered promptly and a very good value and excellent
all around slipper with traction.
I've tried using flatten
and chomsky_normal_form
but couldn't get the desired result. 我尝试使用
flatten
和chomsky_normal_form
但无法获得所需的结果。 How to list out each clauses on single line getting rid of tags? 如何在单行上列出每个子句以摆脱标记?
Since all you want to extract from your string s
seems to be lowercase, you can apply one of the following one-liners: 由于您要从字符串
s
提取的所有内容似乎都是小写字母,因此可以应用以下一种格式:
Python list comprehension Python列表理解
print ' '.join(''.join(c for c in s if 'a' <= c <= 'z' or c == ' ').split())
It joins ( ''.join
) all characters that are between "a" and "z" or " ". 它将(a。)和“ z”或“”之间的所有字符连接起来(
''.join
)。 To suppress multiple spaces next to each other it splits the result and joins it again with a space as separator. 为了消除相邻的多个空格,它将结果分割并再次以空格作为分隔符将其合并。
Regular expression 正则表达式
If you prefer regular expressions ( import re
), this even shorter statement yields the same result: 如果您更喜欢正则表达式(
import re
),则此更短的语句将产生相同的结果:
print ' '.join(re.findall('[a-z]+', s))
Edit 编辑
If you want to process each clause individually, you can split the whole string s
and then apply the same code to each part (except the first one, which is just the header): 如果要单独处理每个子句,则可以拆分整个字符串
s
,然后将相同的代码应用于每个部分(第一个部分除外,后者只是标题):
for part in s.split("CLAUSE")[1:]:
print ' '.join(re.findall('[a-z]+', part))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.