train Gensim word2vec using large txt file

Question

I have a large txt file(150MG) like this

'intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one', 'better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', ...

I wanna train word2vec model model using that file but it gives me RAM problem.i dont know how to feed txt file to word2vec model.this is my code.i know that my code has problem but i don't know where is it.

import gensim 


f = open('your_file1.txt')
for line in f:
    b=line
   model = gensim.models.Word2Vec([b],min_count=1,size=32)

w1 = "bad"
model.wv.most_similar (positive=w1)

Answer 1

You can make an iterator that reads your file one line at a time instead of reading everything in memory at once. The following should work:

class SentenceIterator: 
    def __init__(self, filepath): 
        self.filepath = filepath 

    def __iter__(self): 
        for line in open(self.filepath): 
            yield line.split() 

sentences = SentenceIterator('datadir/textfile.txt') 
model = Word2Vec(sentences)

train Gensim word2vec using large txt file

Question

1 answers

solution1
1 2019-03-10 19:00:33

train Gensim word2vec using large txt file

Question

1 answers

solution1 1 2019-03-10 19:00:33

solution1
1 2019-03-10 19:00:33