确定Python句子中2个单词之间的接近度

Question

我需要确定Python句子中2个单词之间的接近度。 例如，在下面的句子中：

the foo and the bar is foo bar

我想，以确定单词之间的距离foo和bar （确定之间发生的字数foo和bar ）。

请注意，以上句子中多次出现foo和bar产生不同的距离组合。

同样，单词的顺序也无所谓。 确定这些词之间距离的最佳方法是什么？

这是我正在使用的代码：

sentence = "the foo and the bar is foo bar"

first_word_to_look = 'foo'
second_word_to_look = 'bar'

first_word = 0
second_word = 0
dist = 0

if first_word_to_look in sentence and second_word_to_look in sentence:

    first_word = len(sentence.split(first_word_to_look)[0].split())
    second_word = len(sentence.split(second_word_to_look)[0].split())

    if first_word < second_word:
        dist = second_word-first_word
    else:
        dist = first_word-second_word

print dist  # distance

上面的代码的问题在于，它仅考虑两个单词的首次出现。 如果同一句子中出现的次数甚至多于第一个句子，则不会考虑。

确定距离的最佳方法是什么？ python中是否有任何库可以做得更好？

Answer 1

您可以分割你的句子列出的单词，并使用index的方法list ：

sentence = "the foo and the bar is foo bar"
words = sentence.split()

def get_distance(w1, w2):
     if w1 in words and w2 in words:
          return abs(words.index(w2) - words.index(w1))

更新以计算所有单词出现次数：

import itertools

def get_distance(w1, w2):
    if w1 in words and w2 in words:
        w1_indexes = [index for index, value in enumerate(words) if value == w1]    
        w2_indexes = [index for index, value in enumerate(words) if value == w2]    
        distances = [abs(item[0] - item[1]) for item in itertools.product(w1_indexes, w2_indexes)]
        return {'min': min(distances), 'avg': sum(distances)/float(len(distances))}

Answer 2

我们也可以使用正则表达式。 下一行将返回一个列表，其中在foo和bar之间出现的单词数

import re
sentence = "the foo and the bar is foo bar"
first_word_to_look = 'foo'
second_word_to_look = 'bar'
word_length = [len(i.split())-2 for i in re.findall(r'foo.*?bar',sentence)]
print word_length

确定Python句子中2个单词之间的接近度

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-10-28 10:54:59

解决方案2
0 2016-12-07 09:38:48

确定Python句子中2个单词之间的接近度

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-10-28 10:54:59

解决方案2 0 2016-12-07 09:38:48

解决方案1
4 已采纳 2015-10-28 10:54:59

解决方案2
0 2016-12-07 09:38:48