简体   繁体   English

如何从原始语料库中获取特定范围的单词?

[英]How to get specific ranged words from raw corpus?

import nltk   
import nltk.data    

nltk.corpus.brown    
y= nltk.corpus.brown.raw()  
print(y)

When I do print(y) it shows me all of the raw data in this corpus, but I want to get only 10,000 words from this raw corpus. 当我进行print(y)它会显示该语料库中的所有原始数据,但是我只想从该语料库中获得10,000单词。 How can I achieve this? 我该如何实现?

You could do : 你可以做:

import random
words = nltk.corpus.brown.words()
random_words = random.sample(words, 10000)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM