简体   繁体   中英

Error in Keras while doing Multi-class classification

I am trying to do Multi-class classification in Keras. I am using the crowdflower dataset .Here is my code below:

import pandas as pd

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
import numpy as np
from sklearn.preprocessing import LabelEncoder

from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense
from sklearn.preprocessing import LabelEncoder



df=pd.read_csv('text_emotion.csv')

df.drop(['tweet_id','author'],axis=1,inplace=True)


df=df[~df['sentiment'].isin(['empty','enthusiasm','boredom','anger'])]


df = df.sample(frac=1).reset_index(drop=True)

labels = []
texts = []


for i,row in df.iterrows():
    texts.append(row['content'])
    labels.append(row['sentiment'])

tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)


sequences = tokenizer.texts_to_sequences(texts)

word_index = tokenizer.word_index


print('Found %s unique tokens.' % len(word_index))

data = pad_sequences(sequences)


encoder = LabelEncoder()
encoder.fit(labels)
encoded_Y = encoder.transform(labels)


labels = np.asarray(encoded_Y)


print('Shape of data tensor:', data.shape)
print('Shape of label tensor:', labels.shape)

indices = np.arange(data.shape[0])
np.random.shuffle(indices)
data = data[indices]
labels = labels[indices]
print labels.shape


model = Sequential()



model.add(Embedding(40000, 8,input_length=37))

model.add(Flatten())




model.add(Dense(100,activation='relu'))
model.add(Dense(50, activation='relu'))
model.add(Dense(9, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


model.fit(data,labels, validation_split=0.2, epochs=150, batch_size=100)

I am getting this error:

ValueError: Error when checking target: expected dense_3 to have shape (9,) but got array with shape (1,)

Can someone please point out the fault with my logic? I understand my question is kind of similar to Exception: Error when checking model target: expected dense_3 to have shape (None, 1000) but got array with shape (32, 2)

But I have not managed to find the bug.

You are making multiple mistakes in that code and I will suggest some improvements to make the code better:

  1. remove: for i,row in df.iterrows(): you can directly use

     labels = df['sentiment'] texts = df['content'] 
  2. While using tokenizer = Tokenizer(5000) set max words, this is vocabulary size.

  3. When padding data = pad_sequences(sequences, maxlen=37) provide max length.

  4. Don't convert the output just to an array of values labels = np.asarray(encoded_Y) , it's not a regression. You have to one hot encode it:

     from keras.utils import np_utils labels = np_utils.to_categorical(encoded_Y) 
  5. When providing the embedding layer model.add(Embedding(40000, 8,input_length=37)) your vocab size is 40K and embedding dimension is 8. Does't make much sense as the data set you have provided has close to 40K unique words. which can't be all be given a proper embedding. model.add(Embedding(5000, 30, input_length=37)) Change to a more sensible number vocab size. NOTE: if you want to use 40000 please update Tokenizer(5000) to the same number.

  6. Use variables like embedding_dim = 8 , vocab_size=40000 . whatever the value might be.

  7. Instead of model.add(Dense(9, activation='softmax')) as final layer use this, keeps the code clean.

     model.add(Dense(labels.shape[1], activation='softmax')) 

Final working code is attached at this Link

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM