Error in Keras while doing Multi-class classification

Question

I am trying to do Multi-class classification in Keras. I am using the crowdflower dataset .Here is my code below:

import pandas as pd

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import numpy as np
from sklearn.preprocessing import LabelEncoder

from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense
from sklearn.preprocessing import LabelEncoder



df=pd.read_csv('text_emotion.csv')

df.drop(['tweet_id','author'],axis=1,inplace=True)


df=df[~df['sentiment'].isin(['empty','enthusiasm','boredom','anger'])]


df = df.sample(frac=1).reset_index(drop=True)

labels = []
texts = []


for i,row in df.iterrows():
    texts.append(row['content'])
    labels.append(row['sentiment'])

tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)


sequences = tokenizer.texts_to_sequences(texts)

word_index = tokenizer.word_index


print('Found %s unique tokens.' % len(word_index))

data = pad_sequences(sequences)


encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)


labels = np.asarray(encoded_Y)


print('Shape of data tensor:', data.shape)
print('Shape of label tensor:', labels.shape)

indices = np.arange(data.shape[0])
np.random.shuffle(indices)
data = data[indices]
labels = labels[indices]
print labels.shape


model = Sequential()



model.add(Embedding(40000, 8,input_length=37))

model.add(Flatten())




model.add(Dense(100,activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(9, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


model.fit(data,labels, validation_split=0.2, epochs=150, batch_size=100)

I am getting this error:

ValueError: Error when checking target: expected dense_3 to have shape (9,) but got array with shape (1,)

Can someone please point out the fault with my logic? I understand my question is kind of similar to Exception: Error when checking model target: expected dense_3 to have shape (None, 1000) but got array with shape (32, 2)

But I have not managed to find the bug.

Answer 1

You are making multiple mistakes in that code and I will suggest some improvements to make the code better:

remove: for i,row in df.iterrows(): you can directly use
```
 labels = df['sentiment'] texts = df['content'] 
```
While using tokenizer = Tokenizer(5000) set max words, this is vocabulary size.
When padding data = pad_sequences(sequences, maxlen=37) provide max length.
Don't convert the output just to an array of values labels = np.asarray(encoded_Y) , it's not a regression. You have to one hot encode it:
```
 from keras.utils import np_utils labels = np_utils.to_categorical(encoded_Y) 
```
When providing the embedding layer model.add(Embedding(40000, 8,input_length=37)) your vocab size is 40K and embedding dimension is 8. Does't make much sense as the data set you have provided has close to 40K unique words. which can't be all be given a proper embedding. model.add(Embedding(5000, 30, input_length=37)) Change to a more sensible number vocab size. NOTE: if you want to use 40000 please update Tokenizer(5000) to the same number.
Use variables like embedding_dim = 8 , vocab_size=40000 . whatever the value might be.
Instead of model.add(Dense(9, activation='softmax')) as final layer use this, keeps the code clean.
```
 model.add(Dense(labels.shape[1], activation='softmax')) 
```

Final working code is attached at this Link

Error in Keras while doing Multi-class classification

Question

1 answers

solution1
0 ACCPTED 2018-02-26 14:40:40

Error in Keras while doing Multi-class classification

Question

1 answers

solution1 0 ACCPTED 2018-02-26 14:40:40

solution1
0 ACCPTED 2018-02-26 14:40:40