Python在字符串中的短语周围找到n大小的窗口

Question

I have a string, for example 'i cant sleep what should i do' as well as a phrase that is contained in the string 'cant sleep' . 我有一个字符串，例如'i cant sleep what should i do'以及字符串'cant sleep'包含的短语。 What I am trying to accomplish is to get an n sized window around the phrase even if there isn't n words on either side. 我想要完成的是在短语周围获得一个n大小的窗口，即使两边都没有n个单词。 So in this case if I had a window size of 2 (2 words on either size of the phrase) I would want 'i cant sleep what should' . 因此，在这种情况下，如果我的窗口大小为2（在短语的任一大小上为2个单词），我会希望'i cant sleep what should' 。

This is my current solution attempting to find a window size of 2, however it fails when the number of words to the left or right of the phrase is less than 2, I would also like to be able to use different window sizes. 这是我当前尝试找到窗口大小为2的解决方案，但是当短语左侧或右侧的单词数小于2时，它会失败，我还希望能够使用不同的窗口大小。

import re
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)
print sentence_words[left-2:right+3]

left = sentence_words.index(span_words[0]) 
right =  sentence_words.index(span_words[-1])
print sentence_words[left-2:right+3]

Answer 1

You can use the partition method for a non-regex solution: 您可以将分区方法用于非正则表达式解决方案：

>>> s='i cant sleep what should i do'
>>> p='cant sleep'
>>> lh, _, rh = s.partition(p)

Then use a slice to get up to two words: 然后使用切片最多得到两个单词：

>>> n=2
>>> ' '.join(lh.split()[:n]), p, ' '.join(rh.split()[:n])
('i', 'cant sleep', 'what should')

Your exact output: 你的确切输出：

>>> ' '.join(lh.split()[:n]+[p]+rh.split()[:n])
'i cant sleep what should'

You would want to check whether p is in s or if the partition succeeds of course. 您可能希望检查p是否在s或者当然分区是否成功。

As pointed out in comments, lh should be a negative to take the last n words (thanks Mathias Ettinger): 正如评论中指出的那样， lh应该是最后n单词的否定（感谢Mathias Ettinger）：

>>> s='w1 w2 w3 w4 w5 w6 w7 w8 w9'
>>> p='w4 w5'
>>> n=2
>>> ' '.join(lh.split()[-n:]+[p]+rh.split()[:n])
'w2 w3 w4 w5 w6 w7'

Answer 2

If you define words being entities separated by spaces you can split your sentences and use regular python slicing: 如果您将单词定义为由空格分隔的实体，则可以拆分句子并使用常规的python切片：

def get_window(sentence, phrase, window_size):
    sentence = sentence.split()
    phrase = phrase.split()
    words = len(phrase)

    for i,word in enumerate(sentence):
        if word == phrase[0] and sentence[i:i+words] == phrase:
            start = max(0, i-window_size)
            return ' '.join(sentence[start:i+words+window_size])

sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
print(get_window(sentence, phrase, 2))

You can also change it to a generator by changing return to yield and be able to generate all windows if several match of phrase are in sentence : 您也可以将它通过改变改变发电机return ，以yield并能够产生所有窗口，如果几个比赛phrase在sentence ：

>>> list(gen_window('I dont need it, I need to get rid of it', 'need', 2))
['I dont need it, I', 'it, I need to get']

Answer 3

import re

def contains_sublist(lst, sublst):
    n = len(sublst)

    for i in xrange(len(lst)-n+1):
        if (sublst == lst[i:i+n]):
            a = max(i, i-2)
            b = min(i+n+2, len(lst))
            return ' '.join(lst[a:b])


sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
sentence_words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)

print contains_sublist(sentence_words, phrase_words)

Answer 4

you can split words using inbuilt string methods, so re shouldn't be nessesary. 你可以使用内置的字符串方法拆分单词，所以re不应该是nessesary。 If you want to define varrring values, then wrap it in a function call like so: 如果要定义varrring值，请将其包装在函数调用中，如下所示：

def get_word_window(sentence, phrase, w_left=0, w_right=0):
    w_lst = sentence.split()
    p_lst = phrase.split()

    for i,word in enumerate(w_lst):
        if word == p_lst[0] and \
           w_lst[i:i+len(p_lst)] == p_lst:
            left = max(0, i-w_left)
            right = min(len(w_lst), i+w_right+len(p_list)

    return w_lst[left:right]

Then you can get the new phrase like so: 然后你可以得到这样的新短语：

>>> sentence='i cant sleep what should i do'
>>> phrase='cant sleep'
>>> ' '.join(get_word_window(sentence,phrase,2,2))
'i cant sleep what should'

Python在字符串中的短语周围找到n大小的窗口

问题描述

4 个解决方案

解决方案1
4 已采纳 2015-10-22 18:19:08

解决方案2
2 2015-10-22 18:06:39

解决方案3
1 2015-10-22 17:59:54

解决方案4
1 2015-10-22 18:20:14

Python在字符串中的短语周围找到n大小的窗口

问题描述

4 个解决方案

解决方案1 4 已采纳 2015-10-22 18:19:08

解决方案2 2 2015-10-22 18:06:39

解决方案3 1 2015-10-22 17:59:54

解决方案4 1 2015-10-22 18:20:14

解决方案1
4 已采纳 2015-10-22 18:19:08

解决方案2
2 2015-10-22 18:06:39

解决方案3
1 2015-10-22 17:59:54

解决方案4
1 2015-10-22 18:20:14