[英]How to get specific ranged words from raw corpus?
import nltk
import nltk.data
nltk.corpus.brown
y= nltk.corpus.brown.raw()
print(y)
When I do print(y)
it shows me all of the raw data in this corpus, but I want to get only 10,000
words from this raw corpus. 当我进行
print(y)
它会显示该语料库中的所有原始数据,但是我只想从该语料库中获得10,000
单词。 How can I achieve this? 我该如何实现?
You could do : 你可以做:
import random
words = nltk.corpus.brown.words()
random_words = random.sample(words, 10000)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.