简体   繁体   中英

How can I split a string on every kth occurence of a space but with overlap

I have a string:

Your dog is running up the tree.

I want to be able to split it on every kth space but have overlap. For example on every other space:

Your dog is running up the tree.
out = ['Your dog', 'dog is', 'is running', 'running up', 'up the', 'the tree']

On every second space:

Your dog is running up the tree.
out = ['Your dog is', 'dog is running', 'is running up', 'running up the', 'up the tree']

I know I can do something like

>>> i=iter(s.split('-'))                  
>>> map("-".join,zip(i,i)) 

But this does not work for the overlapping I want. Any ideas?

I suggest splitting at every whitespace first and then joining the desired amount of words back together while iterating over the list

s = 'Your dog is running up the tree.'
lst = s.split()

def k_with_overlap(lst, k):
    return [' '.join(lst[i:i+k]) for i in range(len(lst) - k + 1)]

k_with_overlap(lst, 2)

['Your dog', 'dog is', 'is running', 'running up', 'up the', 'the tree.']

I guess this is what you might need :

>>> s = 'Your dog is running up the tree.'
>>> n = 2
>>> [' '.join(s.split()[i:i+n]) for i in range(0,len(s.split()), n)]
['Your dog', 'is running', 'up the', 'tree.']

I tried the following, the answer seem to be what you might be expecting.

def split(sentence, space_num):
    sent_array = sentence.split(' ')
    length = len(sent_array)
    output = []
    for i in range(length+1-space_num):
        list_comp = sent_array[i:i+space_num]
        output.append(' '.join(list_comp))
    return output

print(split('the quick brown fox jumped over the lazy dog', 5))

the output is as below (try changing the space_num as per your requirement).

['the quick brown fox jumped', 'quick brown fox jumped over', 'brown fox jumped over the', 'fox jumped over the lazy', 'jumped over the lazy dog']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM