Python：如何从txt文件中找到最匹配的句子

Question

I wanted to output if there is any similar sentence present in a txt file 我想输出txt文件中是否存在任何类似的句子

Example: 例：
If the .txt file contains 如果.txt文件包含

1 . 1。 What is the biggest planet of our Solar system? 我们太阳系最大的星球是什么？
2 . 2。 How to make tea? 如何泡茶？
3 . 3。 Which our Solar system's biggest planet? 我们太阳系中哪个星球最大？

In this case it should result:- 在这种情况下，它应导致：
3 . 3。 Which our Solar system's biggest planet? 我们太阳系中哪个星球最大？

Basically it should compare if there is more than 4 or 5 words which is similar in the lines of the file 基本上，应该比较文件行中是否有超过4个或5个单词相似

Answer 1

I agree with John Coleman's suggestion. 我同意约翰·科尔曼的建议。 difflib can help you find similarity metric between two string. difflib可以帮助您找到两个字符串之间的相似性度量。 Here's one of the possible approaches: 这是一种可能的方法：

from difflib import SequenceMatcher

sentences = []
with open('./bp.txt', 'r') as f:
    for line in f:
        # only consider lines that have numbers at the beginning
        if line.split('.')[0].isdigit():
            sentences.append(line.split('\n')[0])
max_prob = 0
similar_sentence = None
length = len(sentences)
for i in range(length):
    for j in range(i+1,length):
        match_ratio = SequenceMatcher(None, sentences[i], sentences[j]).ratio()
        if  match_ratio > max_prob:
            max_prob = match_ratio
            similar_sentence = sentences[j]
if similar_sentence is not None:
    print(similar_sentence)

Python：如何从txt文件中找到最匹配的句子

问题描述

1 个解决方案

解决方案1
2 2019-03-08 03:01:19

Python：如何从txt文件中找到最匹配的句子

问题描述

1 个解决方案

解决方案1 2 2019-03-08 03:01:19

解决方案1
2 2019-03-08 03:01:19