简体   繁体   中英

tensorflow: it is possible to build a neural network with categorical targets with one hot encoding?

I have a dataset in which the targets are categories of products purchased by a customer. Each customer can buy one or more categories.

like this:

在此处输入图像描述

Previously, I built this model to work with a single target (buy or not buy for each client):

model = tf.keras.Sequential([
                            tf.keras.layers.Dense(hidden_layer_size, activation='tanh'), 
                            tf.keras.layers.Dense(hidden_layer_size, activation='tanh'), 
                            tf.keras.layers.Dense(output_size, activation='softmax') 
                            ])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

batch_size = 20000
max_epochs = 200
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2) 
model.fit(train_inputs,
          train_targets,
          batch_size = batch_size,
          epochs = max_epochs,
          callbacks = [early_stopping],
          validation_data = (validation_inputs, validation_targets),
          verbose=2)

it is necessary to generate a dataset with the one hot encoding before creating the targets, and then use them as a [nx 5] tensor?. With the one hot encoding, the client 1, for example, has the target: [0,0,1,0,1]:

在此处输入图像描述

How can I build the model to work with targets like this?

or the targets can be encoded with tensorflow inside the model?

Sorry if my question is a little basic, but I'm just starting to work with tensorflow

You are looking for multi-label classification.

In such a scenario, your output will be still one-hot-encoded but there will be multiple targets. For example, [0, 0, 0, 0, 1, 0, 1] .

Your network structure will be same, except your final activation has to be sigmoid instead of softmax and you will train your model with binary_crossentropy loss.

ref: https://www.kaggle.com/roccoli/multi-label-classification-with-keras

This is the case of multi-class, multi-label classification. Unlike softmax where the output are mutually inclusive(If one output is true then other will be false), you are handling cases where output is mutually exclusive. It should focus upon whether the client bought jacket or not, whether the client bought t shirt or not than focusing on the fact that if I bought jacket then I will not buy t shirt . In this case, the output solely focuses on its corresponding output. To achieve this, we use sigmoid where each output value of your hot vector encoded output learns on itself. In order to learn that, your loss will be binary cross entropy where the output jacket , t shirt and so on is learnt whether it was bought or not, independent of each other.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM