I want to use BERT model to do multi-label classification with Tensorflow.
To do so, I want to adapt the example run_classifier.py
from BERT github repository , which is an example on how to use BERT to do simple classification, using the pre-trained weights given by Google Research . (For example with BERT-Base, Cased
)
I have X
different labels which have value of either 0 or 1, so I want to add to the original BERT model a new Dense layer of size X
and using the sigmoid_cross_entropy_with_logits
activation function.
So, for the theorical part I think I am OK.
The problem is that I don't know how I can append a new output layer and retrain only this new layer with my dataset, using the existing BertModel
class.
Here is the original create_model()
function from run_classifier.py
where I guess I have to do my modifications. But I am a bit lost on what to do.
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
labels, num_labels, use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
And here is the same function, with some of my modifications, but where there is things missing (and wrong things too? )
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids, labels, num_labels):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable("output_weights", [num_labels, hidden_size],initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable("output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
per_example_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=logits)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
The other things I have adapted in the code and for which I had no problem :
So, if anyone knows what I should do to resolve my problem, or even point out some obvious mistake I may have done, I would be glad to hear it.
Notes :
You want to replace the softmax that models a single distribution over possible outputs (all scores sum up to one) with sigmoid which models an independent distribution for each class (there is yes/no distribution for each output).
So, you correctly change the loss function, but you also need to change how you compute the probabilities. It should be:
probabilities = tf.sigmoid(logits)
In this case, you don't need the log_probs
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.