How to prepare data for word2vec in gensim and fasttext?

Question

I want to train word2vec and fasttext to get vectors for a specific dataset that I have.

What should my model take as input?

My file is like this:

Customer_4: I want to book a ticket to New York.
Agent_9: Okay, when do you want the tickets for
Customer_4: hmm, wait a sec
Agent_9: Sure
Customer_4: When is the least expensive to fly

Now, How should I prepare my data for word2vec to run? Does the word2vec model take inter sentence similaarity into account, ie should i not prepare the corpus sentence wise.

Answer 1

One way would be that you first split your document into lines, then for each line, split the line into tokens. Then you end up with a corpus of list of list of tokens. After that, you can feed it into the gensim word2vec model.

How to prepare data for word2vec in gensim and fasttext?

Question

1 answers

solution1
0 2018-10-28 23:51:00

How to prepare data for word2vec in gensim and fasttext?

Question

1 answers

solution1 0 2018-10-28 23:51:00

solution1
0 2018-10-28 23:51:00