Word2vec with Conv1D for text classification confusion

Question

I am doing text classification and plan to use word2vec word embeddings and pass it to Conv1D layers for text classification. I have a dataframe which contains the texts and corresponding labels(sentiments). I have used the gensim module and used word2vec algorithm to generate the word-embedding model. The code I used:

import pandas as pd
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
df=pd.read_csv('emotion_merged_dataset.csv')
texts=df['text']
labels=df['sentiment']
df_tokenized=df.apply(lambda row: word_tokenize(row['text']), axis=1)
model = Word2Vec(df_tokenized, min_count=1)

I plan to use CNN and use this word-embedding model. But how should I use this word-embedding model for my cnn? What should be my input?

I plan to use something like(obviously not with the same hyper-parameters):

model = Sequential()
model.add(layers.Embedding(max_features, 128, input_length=max_len))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1))

Can somebody help me out and point me in the right direction? Thanks in advance.

Answer 1

Sorry for the late response, I hope it is still useful for you. Depending on your application you may need to download a specific wordembedding file, for example here yoou have the Glove files

EMBEDDING_FILE='glove.6B.50d.txt'

embed_size = 50 # how big is each word vector
max_features = 20000 # how many unique words to use (i.e num rows in embedding vector)
maxlen = 100 # max number of words in a comment to use

word_index = tokenizer.word_index
nb_words = min(max_features, len(word_index))
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, embed_size))
for word, i in word_index.items():
    if i >= max_features: continue
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: embedding_matrix[i] = embedding_vector

#this is how you load the weights in the embedding layer
inp = Input(shape=(maxlen,))
x = Embedding(max_features, embed_size, weights=[embedding_matrix])(inp)

I took this code from Jeremy Howard , I think this is all you need, if you want to load other file the process is pretty similar, usually you just have to change the loading file

Word2vec with Conv1D for text classification confusion

Question

1 answers

solution1
2 ACCPTED 2018-03-05 08:48:56

Word2vec with Conv1D for text classification confusion

Question

1 answers

solution1 2 ACCPTED 2018-03-05 08:48:56

solution1
2 ACCPTED 2018-03-05 08:48:56