简体   繁体   中英

How to reproduce a Keras model from the weights/biases?

I want to use the weights and biases from a Keras ML model to make a mathematical prediction function in another program that does not have Keras installed (and cannot).

I have a simple, MLP model that I'm using to fit data. I'm running in Python with Keras and a TensorFlow backend; for now, I'm using an input layer, 1 hidden layer, and 1 output layer. All layers are RELU, my optimizer is adam, and the loss function is mean_squared_error.

From what I understand, the weights I get for the layers should be used mathematically in the form:

(SUM (w*i)) + b

Where the sum is over all weights and inputs, and b is for the bias on the neuron. For example, let's say I have an input layer of shape (33, 64). There are 33 inputs with 64 neurons. I'll have a vector input of dim 33 and a vector output of dim 64. This would make each SUM 33 terms * 33 weights, and the output would be all of the 64 SUMs plus the 64 biases (respectively).

The next layer, in my case it's 32 neurons, will do the same but with 64 inputs and 32 outputs. The output layer I have goes to a single value, so input 32 and output 1.

I have written code to try to mimic the model. Here is a snippet for making a single prediction:

    def modelR(weights, biases, data):
       # This is the input layer.
       y = []
       for i in range(len(weights[0][0])):
           x = np.zeros(len(weights[0][0]))
           for j in range(len(data)):
               x[i] += weights[0][j][i]*data[j]
           y.append(x[i]+biases[0][i])

       # This is the hidden layer.
       z = []
       for i in range(len(weights[1][0])):
           x = np.zeros(len(weights[1][0]))
           for j in range(len(y)):
               x[i] += weights[1][j][i]*y[j]
           z.append(x[i]+biases[1][i])

       # This is the output layer.
       p = 0.0
       for i in range(len(z)):
           p += weights[-1][i][0]*z[i]
       p = p+biases[-1][0]

       return p

To be clear, "weights" and "biases are derived via:

    weights = []
    biases = []
    for i in range(len(model.layers)):
       weights.append(model.layers[i].get_weights()[0])
       biases.append(model.layers[i].get_weights()[1])

    weights = np.asarray(weights)
    biases = np.asarray(biases)

So the first weight on the first neuron for the first input is weight[0][0][0], the first weight on the first input for the second neuron is weight[0][1][0], etc. I could be wrong on this, which may be where I'm getting stuck. But this makes sense as we're going from (1 x 33) vector to a (1 x 64) vector, so we ought to have a (33 x 64) matrix.

Any ideas of where I'm going wrong? Thanks!

EDIT: ANSWER FOUND I'm marking jhso's answer as correct, even though it didn't work properly in my code as such (I'm probably missing an import statement somewhere). The key was the activation function. I was using RELU, so I shouldn't have been passing along any negative values. Also, jhso shows a nice way to not use loops but to simply do the matrix multiplication (which I didn't know Python did). Now I just have to figure out how to do it in c++!

I think it's good to familiarise yourself with linear algebra when working with machine learning. When we have an equation of the form sum(matrix elem times another matrix elem) it's often a simple matrix multiplication of the form matrix1 * matrix2.T . This simplifies your code quite a bit:

def modelR(weights, biases, data):
   # This is the input layer.
   y = np.matmul(data,weights[0])+biases[0][None,:] 
   y_act = relu(y) #also dropout or any other function you use here
   z = np.matmul(y_act,weights[1])+biases[1][None,:]
   z_act = relu(z) #also dropout and any other function you use here
   p = np.matmul(z_act,weights[2])+biases[2][None,:]
   p_act = sigmoid(p)   
   return p_act

I made a guess at which activation function you use. I'm also unsure of how your data is structured, just make sure that the features/weights are always the inner dimension of the multiplication, ie. if your input is (Bx10) and your weights are (10x64) then input*weights is good enough and will produce an output of shape (Bx64).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM