如何从原始语料库中获取特定范围的单词？

Question

import nltk   
import nltk.data    

nltk.corpus.brown    
y= nltk.corpus.brown.raw()  
print(y)

When I do print(y) it shows me all of the raw data in this corpus, but I want to get only 10,000 words from this raw corpus. 当我进行print(y)它会显示该语料库中的所有原始数据，但是我只想从该语料库中获得10,000单词。 How can I achieve this? 我该如何实现？

Answer 1

You could do : 你可以做：

import random
words = nltk.corpus.brown.words()
random_words = random.sample(words, 10000)

如何从原始语料库中获取特定范围的单词？

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-03-26 16:45:03

如何从原始语料库中获取特定范围的单词？

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-03-26 16:45:03

解决方案1
2 已采纳 2016-03-26 16:45:03