[英]Find whether a sentence has the starting words of another sentence or the ending words of the same sentence
例如,我有一组这样的句子:
New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country.
Lets take a bus to get to point b from point a.
还有这样的一句话:
is cool in the south of that country
输出应该是: The weather is cool in the south of that country.
如果我有一个像of United States The weather is cool
这样的输入of United States The weather is cool
,输出应该是:
D.C. is the capital of United States The weather is cool in the south of that country.
到目前为止,我尝试了difflib
并获得了重叠,但这并不能完全解决所有情况下的问题。
您可以根据句子构建一个包含起始表达式和结束表达式的字典。 然后在这些词典中找到要扩展的句子的前缀和后缀。 在这两种情况下,您都需要为从开头和结尾开始的每个单词子串构建/检查一个键:
sentences="""New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country
Lets take a bus to get to point b from point a""".split("\n")
ends = { tuple(sWords[i:]):sWords[:i] for s in sentences
for sWords in [s.split()] for i in range(len(sWords)) }
starts = { tuple(sWords[:i]):sWords[i:] for s in sentences
for sWords in [s.split()] for i in range(1,len(sWords)+1) }
def extendSentence(sentence):
sWords = sentence.split(" ")
prefix = next( (ends[p] for i in range(1,len(sWords)+1)
for p in [tuple(sWords[:i])] if p in ends),
[])
suffix = next( (starts[p] for i in range(len(sWords))
for p in [tuple(sWords[i:])] if p in starts),
[])
return " ".join(prefix + [sentence] + suffix)
输出:
print(extendSentence("of United States The weather is cool"))
# D.C. is the capital of United States The weather is cool in the south of that country
print(extendSentence("is cool in the south of that country"))
# The weather is cool in the south of that country
请注意,我必须删除句子末尾的句点,因为它们会阻止匹配。 您需要在字典构建步骤中清理这些
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.