简体   繁体   中英

How to load large dataset to gensim word2vec model

So I have multiple text files(around 40). and each file has around 2000 articles (average of 500 words each). And each document is a single line in the text file.

So because of the memory limitations I wanted to use dynamic loading of these text files for training. (Perhaps a iterator class?)

so how do I proceed?

  • train each text file -> save the model -> load the model and rerun on new data?
  • is there a way with iterator class to do this automatically?
  • should I give sentence by sentence, article by article or text file by text file as input to model training?

So I have multiple text files(around 40). and each file has around 2000 articles (average of 500 words each). And each document is a single line in the text file.

So because of the memory limitations I wanted to use dynamic loading of these text files for training. (Perhaps a iterator class?)

so how do I proceed?

  • train each text file -> save the model -> load the model and rerun on new data?
  • is there a way with iterator class to do this automatically?
  • should I give sentence by sentence, article by article or text file by text file as input to model training?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM