I have a string:
Your dog is running up the tree.
I want to be able to split it on every kth space but have overlap. For example on every other space:
Your dog is running up the tree.
out = ['Your dog', 'dog is', 'is running', 'running up', 'up the', 'the tree']
On every second space:
Your dog is running up the tree.
out = ['Your dog is', 'dog is running', 'is running up', 'running up the', 'up the tree']
I know I can do something like
>>> i=iter(s.split('-'))
>>> map("-".join,zip(i,i))
But this does not work for the overlapping I want. Any ideas?
I suggest splitting at every whitespace first and then joining the desired amount of words back together while iterating over the list
s = 'Your dog is running up the tree.'
lst = s.split()
def k_with_overlap(lst, k):
return [' '.join(lst[i:i+k]) for i in range(len(lst) - k + 1)]
k_with_overlap(lst, 2)
['Your dog', 'dog is', 'is running', 'running up', 'up the', 'the tree.']
I guess this is what you might need :
>>> s = 'Your dog is running up the tree.'
>>> n = 2
>>> [' '.join(s.split()[i:i+n]) for i in range(0,len(s.split()), n)]
['Your dog', 'is running', 'up the', 'tree.']
I tried the following, the answer seem to be what you might be expecting.
def split(sentence, space_num):
sent_array = sentence.split(' ')
length = len(sent_array)
output = []
for i in range(length+1-space_num):
list_comp = sent_array[i:i+space_num]
output.append(' '.join(list_comp))
return output
print(split('the quick brown fox jumped over the lazy dog', 5))
the output is as below (try changing the space_num as per your requirement).
['the quick brown fox jumped', 'quick brown fox jumped over', 'brown fox jumped over the', 'fox jumped over the lazy', 'jumped over the lazy dog']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.