简体   繁体   中英

ELMo Embedding layer with Keras

I have been using Keras default embedding layer with word embeddings in my architecture. Architecture looks like this -

left_input = Input(shape=(max_seq_length,), dtype='int32')
right_input = Input(shape=(max_seq_length,), dtype='int32')

embedding_layer = Embedding(len(embeddings), embedding_dim, weights=[embeddings], input_length=max_seq_length,

# Since this is a siamese network, both sides share the same LSTM
shared_lstm = LSTM(n_hidden, name="lstm")

left_output = shared_lstm(encoded_left)
right_output = shared_lstm(encoded_right)

I want to replace the embedding layer with ELMo embeddings. So I used a custom embedding layer - found in this repo - https://github.com/strongio/keras-elmo/blob/master/Elmo%20Keras.ipynb . Embedding layer looks like this -

class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
    self.dimensions = 1024
    super(ElmoEmbeddingLayer, self).__init__(**kwargs)

def build(self, input_shape):
    self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,

    self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
    super(ElmoEmbeddingLayer, self).build(input_shape)

def call(self, x, mask=None):
    result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
    return result

def compute_mask(self, inputs, mask=None):
    return K.not_equal(inputs, '--PAD--')

def compute_output_shape(self, input_shape):
    return (input_shape[0], self.dimensions)

I changed the architecture for the new embedding layer.

 # The visible layer
left_input = Input(shape=(1,), dtype="string")
right_input = Input(shape=(1,), dtype="string")

embedding_layer = ElmoEmbeddingLayer()

# Embedded version of the inputs
encoded_left = embedding_layer(left_input)
encoded_right = embedding_layer(right_input)

# Since this is a siamese network, both sides share the same LSTM
shared_lstm = LSTM(n_hidden, name="lstm")

left_output = shared_gru(encoded_left)
right_output = shared_gru(encoded_right)

But I am getting error -

ValueError: Input 0 is incompatible with layer lstm: expected ndim=3, found ndim=2

What am I doing wrong here?

The Elmo embedding layer outputs one embedding per input (so the output shape is (batch_size, dim) ) whereas your LSTM expects a sequence (ie shape (batch_size, seq_length, dim) ). I don't think it makes much sense to have an LSTM layer after an Elmo embedding layer since Elmo already uses an LSTM to embed a sequence of words.

I also used that repository as a guide to build a CustomELMo + BiLSTM + CRF model, and I needed to change the dict lookup to 'elmo' instead of 'default'. As Anna Krogager pointed out, when the dict lookup is 'default' the output is (batch_size, dim), which isn't enough dimensions for the LSTM. However when the dict lookup is ['elmo'] the layer returns a tensor of the right dimensions, namely of shape (batch_size, max_length, 1024).

Custom ELMo Layer:

class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
    self.dimensions = 1024
    self.trainable = True
    super(ElmoEmbeddingLayer, self).__init__(**kwargs)

def build(self, input_shape):
    self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,

    self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
    super(ElmoEmbeddingLayer, self).build(input_shape)

def call(self, x, mask=None):
    result = self.elmo(K.squeeze(K.cast(x, tf.string), axis=1),
    return result

# def compute_mask(self, inputs, mask=None):
#   return K.not_equal(inputs, '__PAD__')

def compute_output_shape(self, input_shape):
    return input_shape[0], 48, self.dimensions

And the model is built as follows:

def build_model(): # uses crf from keras_contrib
    input = layers.Input(shape=(1,), dtype=tf.string)
    model = ElmoEmbeddingLayer(name='ElmoEmbeddingLayer')(input)
    model = Bidirectional(LSTM(units=512, return_sequences=True))(model)
    crf = CRF(num_tags)
    out = crf(model)
    model = Model(input, out)
    model.compile(optimizer="rmsprop", loss=crf_loss, metrics=[crf_accuracy, categorical_accuracy, mean_squared_error])
    return model

I hope my code is useful to you, even if it's not exactly the same model. Note that I had to comment out the compute_mask method as it throws

InvalidArgumentError: Incompatible shapes: [32,47] vs. [32,0]    [[{{node loss/crf_1_loss/mul_6}}]]

where 32 is batch size and 47 is one less than my specified max_length (presumably meaning it's accounting for a pad token itself). I haven't worked out the cause of that error yet, so it might be fine for you and your model. However I notice you're using GRU's, and there's an unresolved issue on the repository about adding GRU's. So I'm curious whether you get that isue too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM