简体   繁体   中英

Creating a CoreML LRCN model

Hello and thank you in advance or any help or guidance provided!

The question I have stems from an article posted on Apple's CoreML documentation site. The topic of this article was also covered during the WWDC 2017 lectures and I found it quite interesting. I posted a question recently that was related to part of this same project I'm working on and it was solved with ease; however, as I get further into this endeavor, I find myself not understanding how part of this model is being implemented.

To start off, I have a model I'm building in Keras with a Tensorflow backend that uses convolutional layers in the time distributed wrapper. Following the convolutional section, a single LSTM layer connects to a dense layer as the output. The goal is to create a many to many structure that classifies each item in a padded sequence of images. I'll post the code for the model below.

My plan to train and deploy this network may raise other questions down the road, but I will make separate a post if they cause trouble. It relates to training with the time distributed wrapper, then striping it off the model and loading the weights for the wrapped layers at CoreML conversion time as the time distributed wrapper doesn't play well with CoreML.

My question is this:

In the aforementioned article (and in a CormeML example project I found on GitHub), the implementation is quite clever. Since CoreML (or at least the stock converter) doesn't support image sequences as inputs, the images are fed one at a time, and the LSTM states are passed out of the network as an output along with the prediction for the input image. For the next image in the sequence, the user passes the image, along with the previous time step's LSTM state so the model can "pick up where it left off" so to speak and handle the single inputs as a sequence. It sort of forms a loop for the LSTM state (this is covered in further detail in the Apple article). Now, for the actual question part...

How is this implemented in a library like Keras? So far I have been successful at outputting the LSTM state using the functional API and the "return_state" setting on the LSTM layer, and routing that to a secondary output. Pretty simple. Not so simple (at least for me), is how to pass that state back INTO the network for the next prediction. I've looked over the source code and documentation for the LSTM layer and I don't see anything that jumps out as an input for the state. The only thing I can think of, is to possibly make the LSTM layer its own model and use the "initial_state" to set it, but based upon a post on the Keras GitHub I found, it seems like the model then needs a custom call function and I'm not sure how to work that into CoreML. Just FYI, I am planning to loop both the hidden and cell states in and out of the model, unless that isn't necessary and only the hidden states should be used as is shown in Apple's model.

Thanks once again. Any help provided is always appreciated!

My current model looks like this:

image_input = Input(shape=(max_sequence_length, 224, 224, 3))
hidden_state_input = Input(shape=((None, 256)))
cell_state_input = Input(shape=((None, 256)))

convolutional_1 = TimeDistributed(Conv2D(64, (3, 3), activation='relu', data_format = 'channels_last'))(image_input)
pooling_1 = TimeDistributed(MaxPooling2D((2, 2), strides=(1, 1)(convolutional_1)

convolutional_2 = TimeDistributed(Conv2D(128, (4,4), activation='relu'))(pooling_1)
pooling_2 = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(convolutional_2)

convolutional_3 = TimeDistributed(Conv2D(256, (4,4), activation='relu'))(pooling_2)
pooling_3 = TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2)))(convolutional_3)

flatten_1 = TimeDistributed(Flatten())(pooling_3)
dropout_1 = TimeDistributed(Dropout(0.5))(flatten_1)

lstm_1, state_h, state_c = LSTM(256, return_sequences=True, return_state=True, stateful=False, dropout=0.5)(dropout_1)

dense_1 = TimeDistributed(Dense(num_classes, activation='sigmoid'))(lstm_1)

model = Model(inputs = [image_input, hidden_state_input, cell_state_input], outputs = [dense_1, state_h, state_c])

Link to Apple article: https://developer.apple.com/documentation/coreml/core_ml_api/making_predictions_with_a_sequence_of_inputs

Link to GitHub repo with an example model that uses a similar method: https://github.com/akimach/GestureAI-CoreML-iOS

Link to the Keras GitHub post about the custom call function: https://github.com/keras-team/keras/issues/2995

It turns out the coremltools converter will automatically add the state inputs and outputs during conversion.

Keras converter _topology.py, line 215 for reference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM