什么是将字符串拆分为连续，重叠的单词列表的最pythonic方法

Question

说我有一句话"The cat ate the mouse." 我想用size = 2来分割句子。

所以结果数组变成：

 ["the cat", "cat ate", "ate the", "the mouse"]

如果我的大小是3，它应该变成：

["the cat ate", "cat ate the", "ate the mouse"]

我现在的方法是使用大量的for循环，我不确定是否有最好的方法。

Answer 1

使用列表切片，您可以获得子列表。

>>> words = "The cat ate the mouse.".rstrip('.').split()
>>> words[0:3]
['The', 'cat', 'ate']

使用str.join将列表转换为由分隔符连接的字符串：

>>> ' '.join(words[0:3])
'The cat ate'

列表理解提供了一种创建单词列表的考虑方法：

>>> n = 2
>>> [' '.join(words[i:i+n]) for i in range(len(words) - n + 1)]
['The cat', 'cat ate', 'ate the', 'the mouse']

>>> n = 3
>>> [' '.join(words[i:i+n]) for i in range(len(words) - n + 1)]
['The cat ate', 'cat ate the', 'ate the mouse']
# [' '.join(words[0:3]), ' '.join(words[1:4]),...]

Answer 2

你可以使用nltk库来完成所有工作

import nltk
from nltk.util import ngrams

text = "The cat ate the mouse."
tokenize = nltk.word_tokenize(text)
bigrams = ngrams(tokenize,3)

for gram in bigrams:
    print gram

是什么让我们:('''，'猫'，'吃'）（'猫'，'吃'，'''）（'吃'，''，'鼠标'）（'''，'鼠标' '，'。'）

什么是将字符串拆分为连续，重叠的单词列表的最pythonic方法

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-05-14 08:26:22

解决方案2
0 2017-05-14 08:34:06

什么是将字符串拆分为连续，重叠的单词列表的最pythonic方法

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-05-14 08:26:22

解决方案2 0 2017-05-14 08:34:06

解决方案1
3 已采纳 2017-05-14 08:26:22

解决方案2
0 2017-05-14 08:34:06