简体   繁体   中英

train Gensim word2vec using large txt file

I have a large txt file(150MG) like this

'intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one', 'better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', ...

I wanna train word2vec model model using that file but it gives me RAM problem.i dont know how to feed txt file to word2vec model.this is my code.i know that my code has problem but i don't know where is it.

import gensim 


f = open('your_file1.txt')
for line in f:
    b=line
   model = gensim.models.Word2Vec([b],min_count=1,size=32)

w1 = "bad"
model.wv.most_similar (positive=w1)

You can make an iterator that reads your file one line at a time instead of reading everything in memory at once. The following should work:

class SentenceIterator: 
    def __init__(self, filepath): 
        self.filepath = filepath 

    def __iter__(self): 
        for line in open(self.filepath): 
            yield line.split() 

sentences = SentenceIterator('datadir/textfile.txt') 
model = Word2Vec(sentences)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM