What is the most pythonic way to split a string into contiguous, overlapping list of words

Question

Say I had a sentence "The cat ate the mouse." I want to split the sentence with size = 2 .

So the result array becomes:

 ["the cat", "cat ate", "ate the", "the mouse"]

If my size was 3, it should become:

["the cat ate", "cat ate the", "ate the mouse"]

My method I have right now uses tons of for loops and I'm not sure if there is a best way.

Answer 1

Using list slice, you can get sub-list.

>>> words = "The cat ate the mouse.".rstrip('.').split()
>>> words[0:3]
['The', 'cat', 'ate']

Use str.join to convert the list to a string joined by delimiter:

>>> ' '.join(words[0:3])
'The cat ate'

List comprehension provides a conside way to create words list:

>>> n = 2
>>> [' '.join(words[i:i+n]) for i in range(len(words) - n + 1)]
['The cat', 'cat ate', 'ate the', 'the mouse']

>>> n = 3
>>> [' '.join(words[i:i+n]) for i in range(len(words) - n + 1)]
['The cat ate', 'cat ate the', 'ate the mouse']
# [' '.join(words[0:3]), ' '.join(words[1:4]),...]

Answer 2

you can use nltk library to do all the job

import nltk
from nltk.util import ngrams

text = "The cat ate the mouse."
tokenize = nltk.word_tokenize(text)
bigrams = ngrams(tokenize,3)

for gram in bigrams:
    print gram

what gives us: ('The', 'cat', 'ate') ('cat', 'ate', 'the') ('ate', 'the', 'mouse') ('the', 'mouse', '.')

What is the most pythonic way to split a string into contiguous, overlapping list of words

Question

2 answers

solution1
3 ACCPTED 2017-05-14 08:26:22

solution2
0 2017-05-14 08:34:06

What is the most pythonic way to split a string into contiguous, overlapping list of words

Question

2 answers

solution1 3 ACCPTED 2017-05-14 08:26:22

solution2 0 2017-05-14 08:34:06

solution1
3 ACCPTED 2017-05-14 08:26:22

solution2
0 2017-05-14 08:34:06