简体   繁体   English

Python在字符串中的短语周围找到n大小的窗口

[英]Python find n-sized window around phrase within string

I have a string, for example 'i cant sleep what should i do' as well as a phrase that is contained in the string 'cant sleep' . 我有一个字符串,例如'i cant sleep what should i do'以及字符串'cant sleep'包含的短语。 What I am trying to accomplish is to get an n sized window around the phrase even if there isn't n words on either side. 我想要完成的是在短语周围获得一个n大小的窗口,即使两边都没有n个单词。 So in this case if I had a window size of 2 (2 words on either size of the phrase) I would want 'i cant sleep what should' . 因此,在这种情况下,如果我的窗口大小为2(在短语的任一大小上为2个单词),我会希望'i cant sleep what should'

This is my current solution attempting to find a window size of 2, however it fails when the number of words to the left or right of the phrase is less than 2, I would also like to be able to use different window sizes. 这是我当前尝试找到窗口大小为2的解决方案,但是当短语左侧或右侧的单词数小于2时,它会失败,我还希望能够使用不同的窗口大小。

import re
sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)
print sentence_words[left-2:right+3]

left = sentence_words.index(span_words[0]) 
right =  sentence_words.index(span_words[-1])
print sentence_words[left-2:right+3]

You can use the partition method for a non-regex solution: 您可以将分区方法用于非正则表达式解决方案:

>>> s='i cant sleep what should i do'
>>> p='cant sleep'
>>> lh, _, rh = s.partition(p)

Then use a slice to get up to two words: 然后使用切片最多得到两个单词:

>>> n=2
>>> ' '.join(lh.split()[:n]), p, ' '.join(rh.split()[:n])
('i', 'cant sleep', 'what should')

Your exact output: 你的确切输出:

>>> ' '.join(lh.split()[:n]+[p]+rh.split()[:n])
'i cant sleep what should'

You would want to check whether p is in s or if the partition succeeds of course. 您可能希望检查p是否在s或者当然分区是否成功。


As pointed out in comments, lh should be a negative to take the last n words (thanks Mathias Ettinger): 正如评论中指出的那样, lh应该是最后n单词的否定(感谢Mathias Ettinger):

>>> s='w1 w2 w3 w4 w5 w6 w7 w8 w9'
>>> p='w4 w5'
>>> n=2
>>> ' '.join(lh.split()[-n:]+[p]+rh.split()[:n])
'w2 w3 w4 w5 w6 w7'

If you define words being entities separated by spaces you can split your sentences and use regular python slicing: 如果您将单词定义为由空格分隔的实体,则可以拆分句子并使用常规的python切片:

def get_window(sentence, phrase, window_size):
    sentence = sentence.split()
    phrase = phrase.split()
    words = len(phrase)

    for i,word in enumerate(sentence):
        if word == phrase[0] and sentence[i:i+words] == phrase:
            start = max(0, i-window_size)
            return ' '.join(sentence[start:i+words+window_size])

sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
print(get_window(sentence, phrase, 2))

You can also change it to a generator by changing return to yield and be able to generate all windows if several match of phrase are in sentence : 您也可以将它通过改变改变发电机return ,以yield并能够产生所有窗口,如果几个比赛phrasesentence

>>> list(gen_window('I dont need it, I need to get rid of it', 'need', 2))
['I dont need it, I', 'it, I need to get']
import re

def contains_sublist(lst, sublst):
    n = len(sublst)

    for i in xrange(len(lst)-n+1):
        if (sublst == lst[i:i+n]):
            a = max(i, i-2)
            b = min(i+n+2, len(lst))
            return ' '.join(lst[a:b])


sentence = 'i cant sleep what should i do'
phrase = 'cant sleep'
sentence_words = re.findall(r'\w+', sentence)
phrase_words = re.findall(r'\w+', phrase)

print contains_sublist(sentence_words, phrase_words)

you can split words using inbuilt string methods, so re shouldn't be nessesary. 你可以使用内置的字符串方法拆分单词,所以re不应该是nessesary。 If you want to define varrring values, then wrap it in a function call like so: 如果要定义varrring值,请将其包装在函数调用中,如下所示:

def get_word_window(sentence, phrase, w_left=0, w_right=0):
    w_lst = sentence.split()
    p_lst = phrase.split()

    for i,word in enumerate(w_lst):
        if word == p_lst[0] and \
           w_lst[i:i+len(p_lst)] == p_lst:
            left = max(0, i-w_left)
            right = min(len(w_lst), i+w_right+len(p_list)

    return w_lst[left:right]

Then you can get the new phrase like so: 然后你可以得到这样的新短语:

>>> sentence='i cant sleep what should i do'
>>> phrase='cant sleep'
>>> ' '.join(get_word_window(sentence,phrase,2,2))
'i cant sleep what should'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在元组列表中查找所有常见的N大小元组 - Find all common N-sized tuples in list of tuples 如何使用 n 大小的 window 遍历列表并对匹配和不匹配的元素集进行操作? - How to iterate over a list with a n-sized window and operate on matched and unmatched set of elements? 字典集的所有组合成 K N 大小的组 - All combinations of set of dictionaries into K N-sized groups Python:查找字符串中的确切短语 - Python: find exact phrase in string 在字符串列表中,找到字符串中的短语,并将字符串中的两个整数(x..y)追加到list。 蟒蛇 - In a list of strings, find a phrase within the string and append two integers (x..y) in string to a list . Python 生成在每个元素上遵循特定条件的n尺寸向量的所有可能组合 - Generating all possible combinations of n-sized vector that follow certain conditions on each element 在 Python 中匹配字符串中的精确短语 - Match exact phrase within a string in Python 在 m 大小的窗口中查找最小 n 值的移动平均值 - Find moving average of the smallest n values in m sized window 如何在python中<n ^ 2次找到任意大小列表列表的副本? - How to find duplicates of list of arbitrary sized lists in < n^2 time in python? 在字符串中查找短语 - Finding a phrase within a string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM