![](/img/trans.png)
[英]What is the most efficient (pythonic) way to ignore words in a list that has parantheses?
[英]What is the most pythonic way to split a string into contiguous, overlapping list of words
说我有一句话"The cat ate the mouse."
我想用size = 2
来分割句子。
所以结果数组变成:
["the cat", "cat ate", "ate the", "the mouse"]
如果我的大小是3,它应该变成:
["the cat ate", "cat ate the", "ate the mouse"]
我现在的方法是使用大量的for循环,我不确定是否有最好的方法。
使用列表切片,您可以获得子列表。
>>> words = "The cat ate the mouse.".rstrip('.').split()
>>> words[0:3]
['The', 'cat', 'ate']
使用str.join
将列表转换为由分隔符连接的字符串:
>>> ' '.join(words[0:3])
'The cat ate'
列表理解提供了一种创建单词列表的考虑方法:
>>> n = 2
>>> [' '.join(words[i:i+n]) for i in range(len(words) - n + 1)]
['The cat', 'cat ate', 'ate the', 'the mouse']
>>> n = 3
>>> [' '.join(words[i:i+n]) for i in range(len(words) - n + 1)]
['The cat ate', 'cat ate the', 'ate the mouse']
# [' '.join(words[0:3]), ' '.join(words[1:4]),...]
你可以使用nltk库来完成所有工作
import nltk
from nltk.util import ngrams
text = "The cat ate the mouse."
tokenize = nltk.word_tokenize(text)
bigrams = ngrams(tokenize,3)
for gram in bigrams:
print gram
是什么让我们:(''','猫','吃')('猫','吃',''')('吃','','鼠标')(''','鼠标' ','。')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.