[英]How can I compute the distance between each pair of the sentence in a text
i am computing the levenshtein distance between the sentences, and now i have a text with several sentences. 我正在计算句子之间的levenshtein距离,现在我有一个包含几个句子的文本。 I don't know how to write the for loop to generate the distance between each pair of the sentence.
我不知道如何编写for循环来生成每对句子之间的距离。
sent = ['motrin 400-600 mg every 8 hour as need for pai . ', 'the depression : continue escitalopram ; assess need to change medication as an outpatient . ', 'Blood cltures from 11-30 grow KLEBSIELLA PNEUMONIAE and 12-01 grow KLEBSIELLA PNEUMONIAE and PROTEUS MIRABILIS both sensitive to the Meropenam which she have already be receive . ']
def similarity(sent):
feature_sim = []
for a,b in sent:
feature_sim[a,b] = pylev.levenshtein(a,b)
print (feature_sim)
Use a pair of nested for-loops.
使用一对嵌套的for循环。
Simplest version: 最简单的版本:
for a in sent:
for b in sent:
...
Skip identical pairs (Levenshtein distance would trivially be 0): 跳过相同的对(Levenshtein距离通常为0):
for a in sent:
for b in sent:
if a != b:
...
Avoid processing duplicate pairs ( a, b
is the same as b, a
): 避免处理重复的对(
a, b
与b, a
相同):
for i in range(0, len(sent)):
for j in range(i+1, len(sent)):
# a = sent[i], b = sent[j]
...
Problem:
feature_sim
is a list , which can only be indexed by integers, not strings or any other types.问题:
feature_sim
是一个list ,只能由整数索引,不能由字符串或任何其他类型索引。
Use a dictionary instead: 请改用字典 :
feature_sim = {}
for i in range(0, len(sent)):
for j in range(i+1, len(sent)):
feature_sim[(sent[i], sent[j])] = pylev.levenshtein(sent[i], sent[j])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.