![](/img/trans.png)
[英]finding text between two specified words in Python, when one of the two words changes
[英]Finding the number of characters between two words in a text in Python
如何在文本或大量文本文件中找到兩個單詞的最近距離。
例如,我想在文本中找到兩個單詞(如“is”和“are”)的最近距離。 這是我所擁有的:
text = "is there a way to find the nearest distance of two words - like is and are - from each other."
def dis_words_text(text, word1,word2):
import numpy as np
ind1 = text.find(word1)
ind2 = text.find(word2)
dis = "at least one of the the words not in text" if -1 in (ind1,ind2) else np.abs(ind1-ind2)
return(dis)
dis_words_text(text, "is","are")
Output: 25
dis_words_text(text, "why","are")
Output: "at least one of the the words not in text"
看起來上面的代碼考慮了第一個“is”和“are”的距離,而不是最近的距離,應該是7個字符。 另請參閱查找字符串中單詞的 position和如何在 Python 中查找字符串中精確單詞的索引作為參考。 我的問題是:1)如果單詞在文本中重復,我如何找到兩個單詞的最近距離(它們之間的字符數),2)速度對於大量文本也很重要。
這是根據字符數查找文本中兩個單詞的最近距離的解決方案:
def nearest_values_twolist(list1,list2):
r1 = list1[0]
r2 = list2[0]
min_val = 1000000
for row1 in list1:
for row2 in list2:
t = abs(row1 - row2)
if t<min_val:
min_val = t
r1 = row1
r2 = row2
return(r1,r2)
def closest_distance_words(text,w1,w2):
ind1 = [w.start(0) for w in re.finditer(r'\b'+w1+r'\b', text)]
ind2 = [w.start(0) for w in re.finditer(r'\b'+w2+r'\b', text)]
i1,i2 = nearest_values_twolist(ind1,ind2)
return(abs(i2-i1))
測試:
text = "is there a way to find the nearest distance of two words - like is and are - from each other."
closest_distance_words(text,w1,w2)
Output:7
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.