简体   繁体   中英

softmax python calculation

I am new to machine learning and learning how to implement the softmax in python, I was following the below thread

Softmax function - python

I was doing some analysis and say if we have a array

batch = np.asarray([[1000,2000,3000,6000],[2000,4000,5000,6000],[1000,2000,3000,6000]])
batch1 = np.asarray([[1,2,2,6000],[2,5,5,3],[3,5,2,1]])

and try to implement softmax (as mentioned in the link above) via:

1) Shared by Pab Torre:

np.exp(z) / np.sum(np.exp(z), axis=1, keepdims=True)

2) Asked in initial question:

e_x = np.exp(x - np.max(x))
return e_x / e_x.sum() 

With both of these I am getting errors (value out of bound), so I kind a use the normalization and try to run it

x= np.mean(batch1)
y = np.std(batch1)
e_x = np.exp((batch1 - x)/y)
j = e_x / e_x.sum(axis = 0)

So my questions to all, is this the way I can implement? If not how can I handle the above cases?

Thanks in advance

The method in 2) is quite stable numerically. Most likely, the error is produced from some other line. See these examples (all work without error):

def softmax(x):
  e_x = np.exp(x - np.max(x))
  return e_x / e_x.sum()

print softmax(np.array([0, 0, 0, 0]))
print softmax(np.array([1000, 2000, 3000, 6000]))
print softmax(np.array([2000, 4000, 5000, 6000]))
print softmax(np.array([1000, 2000, 3000, 6000]))
print softmax(np.array([2000, 2000, 2001, 2000]))
print softmax(np.array([1, 2, 2, 600000]))
print softmax(np.array([1, 2, 2, 60000000]))
print softmax(np.array([1, 2, 2, -60000000]))

Your alternative implementation makes all values closer to 0, which squashes the probabilities. For example:

def alternative_softmax(x):
  mean = np.mean(x)
  std = np.std(x)
  norm = (x - mean) / std
  e_x = np.exp(norm)
  return e_x / e_x.sum(axis=0)


print softmax(np.array([1, 2, 2, 6000]))
print softmax(np.array([2, 5, 5, 3]))
print softmax(np.array([3, 5, 2, 1]))
print

batch = np.asarray([[1, 2, 2, 6000],
                    [2, 5, 5, 3],
                    [3, 5, 2, 1]])
print alternative_softmax(batch)

The output is:

[ 0.  0.  0.  1.]
[ 0.02278457  0.45764028  0.45764028  0.06193488]
[ 0.11245721  0.83095266  0.0413707   0.01521943]

[[ 0.33313225  0.33293125  0.33313217  0.94909178]
 [ 0.33333329  0.33353437  0.33373566  0.02546947]
 [ 0.33353446  0.33353437  0.33313217  0.02543875]]

As you can see, the outputs are very different, and the rows don't even sum up to one.

np.exp(1000) is just way too big of a number. Try using the Decimal library instead.

Here's a simple example: two small integers, 10 and 20.

>>> a = 10
>>> b = 20
>>> denom = math.exp(a) + math.exp(b)
>>> math.exp(a) / denom
4.5397868702434395e-05
>>> math.exp(b) / denom
0.9999546021312976
>>> # Now, let's perform batch-norm on this ...
>>> a = -1
>>> b = 1
>>> denom = math.exp(a) + math.exp(b)
>>> math.exp(a) / denom
0.11920292202211756
>>> math.exp(b) / denom
0.8807970779778824

The results are quite different, unacceptably so. Applying batch-norm doesn't work. Look at your equation again:

j = e_x / e_x.sum(axis = 0)

... and apply it to these simple values:

j = math.exp(10) / (math.exp(10) + math.exp(20))

ANALYSIS AND PROPOSED SOLUTION

What transformation can you apply that preserves the value of j ?

The problem your actual data set hits is that you're trying to represent a value range of e^5000, no matter what shift you make in the exponent values. Are you willing to drive all very-very-small values to 0? If so, you can build an effective algorithm by subtracting a constant from each exponent, until all are, say, 300 or less. This will leave you with results mathematically similar to the original.

Can you handle that code yourself? Find the max of the array; if it's more than 300, find the difference, diff . Subtract diff from every array element. Then do your customary softmax.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM