简体   繁体   中英

Theano sqrt returning NaN values

In my code I'm using theano to calculate an euclidean distance matrix (code from here ):

import theano
import theano.tensor as T
MAT = T.fmatrix('MAT')
squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)
f_euclidean = theano.function([MAT], T.sqrt(squared_euclidean_distances))
def pdist_euclidean(mat):
    return f_euclidean(mat)

But the following code causes some values of the matrix to be NaN . I've read that this happens when calculating theano.tensor.sqrt() and here it's suggested to

Add an eps inside the sqrt (or max(x,EPs))

So I've added an eps to my code:

import theano
import theano.tensor as T

eps = 1e-9

MAT = T.fmatrix('MAT')

squared_euclidean_distances = (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) - 2 * MAT.dot(MAT.T)

f_euclidean = theano.function([MAT], T.sqrt(eps+squared_euclidean_distances))

def pdist_euclidean(mat):
    return f_euclidean(mat)

And I'm adding it before performing sqrt . I'm getting less NaN s, but I'm still getting them. What is the proper solution to the problem? I've also noticed that if MAT is T.dmatrix() there are no NaN

There are two likely sources of NaNs when computing Euclidean distances.

  1. Floating point representation approximation issues causing negative distances when it's really just zero. The square root of a negative number is undefined (assuming you're not interested in the complex solution).

    Imagine MAT has the value

     [[ 1.62434536 -0.61175641 -0.52817175 -1.07296862 0.86540763] [-2.3015387 1.74481176 -0.7612069 0.3190391 -0.24937038] [ 1.46210794 -2.06014071 -0.3224172 -0.38405435 1.13376944] [-1.09989127 -0.17242821 -0.87785842 0.04221375 0.58281521]] 

    Now, if we break down the computation we see that (MAT ** 2).sum(1).reshape((MAT.shape[0], 1)) + (MAT ** 2).sum(1).reshape((1, MAT.shape[0])) has value

     [[ 10.3838024 -9.92394296 10.39763039 -1.51676099] [ -9.92394296 18.16971188 -14.23897281 5.53390084] [ 10.39763039 -14.23897281 15.83764622 -0.65066204] [ -1.51676099 5.53390084 -0.65066204 4.70316652]] 

    and 2 * MAT.dot(MAT.T) has value

     [[ 10.3838024 14.27675714 13.11072431 7.54348446] [ 14.27675714 18.16971188 17.00367905 11.4364392 ] [ 13.11072431 17.00367905 15.83764622 10.27040637] [ 7.54348446 11.4364392 10.27040637 4.70316652]] 

    The diagonal of these two values should be equal (the distance between a vector and itself is zero) and from this textual representation it looks like that is true, but in fact they are slightly different -- the differences are too small to show up when we print the floating point values like this

    This becomes apparent when we print the value of the full expression (the second of the matrices above subtracted from the first)

     [[ 0.00000000e+00 2.42007001e+01 2.71309392e+00 9.06024545e+00] [ 2.42007001e+01 -7.10542736e-15 3.12426519e+01 5.90253836e+00] [ 2.71309392e+00 3.12426519e+01 0.00000000e+00 1.09210684e+01] [ 9.06024545e+00 5.90253836e+00 1.09210684e+01 0.00000000e+00]] 

    The diagonal is almost composed of zeros but the item in the second row, second column is now a very small negative value. When you then compute the square root of all these values you get NaN in that position because the square root of a negative number is undefined (for real numbers).

     [[ 0. 4.91942071 1.64714721 3.01002416] [ 4.91942071 nan 5.58951267 2.42951402] [ 1.64714721 5.58951267 0. 3.30470398] [ 3.01002416 2.42951402 3.30470398 0. ]] 
  2. Computing the gradient of a Euclidean distance expression with respect to a variable inside the input to the function. This can happen not only if a negative number of generated due to floating point approximations, as above, but also if any of the inputs are zero length.

    If y = sqrt(x) then dy/dx = 1/(2 * sqrt(x)) . So if x=0 or, for your purposes, if squared_euclidean_distances=0 then the gradient will be NaN because 2 * sqrt(0) = 0 and dividing by zero is undefined.

The solution to the first problem can be achieved by ensuring squared distances are never negative by forcing them to be no less than zero:

T.sqrt(T.maximum(squared_euclidean_distances, 0.))

To solve both problems (if you need gradients) then you need to make sure the squared distances are never negative or zero, so bound with a small positive epsilon:

T.sqrt(T.maximum(squared_euclidean_distances, eps))

The first solution makes sense since the problem only arises from approximate representations. The second is a bit more questionable because the true distance is zero so, in a sense, the gradient should be undefined. Your specific use case may yield some alternative solution that is maintains the semantics without an artificial bound (eg by ensuring that gradients are never computed/used for zero-length vectors). But NaN values can be pernicious: they can spread like weeds.

Just checking

In squared_euclidian_distances you're adding a column, a row, and a matrix. Are you sure this is what you want?

More precisely, if MAT is of shape (n, p), you're adding matrices of shapes (n, 1), (1, n) and (n, n).

Theano seems to silently repeat the rows (resp. the columns) of each one-dimensional member to match the number of rows and columns of the dot product.

If this is what you want

In reshape, you should probably specify ndim=2 according to basic tensor functionality : reshape .

If the shape is a Variable argument, then you might need to use the optional ndim parameter to declare how many elements the shape has, and therefore how many dimensions the reshaped Variable will have.

Also, it seems that squared_euclidean_distances should always be positive, unless imprecision errors in the difference change zero values into small negative values. If this is true, and if negative values are responsible for the NaNs you're seeing, you could indeed get rid of them without corrupting your result by surrounding squared_euclidean_distances with abs(...) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM