简体   繁体   中英

Text Classification Using Neural Network

I am new to machine learning and neural network. I am trying to do text classification using neural network from scratch. In my dataset there are 7500 documents each labeled with one of seven classes. There are about 5800 unique words. I am using one hidden layer with 4000 neurons. Using sigmoid for activation function. Learning rate=0.1,No dropout.

During training After about 2 to 3 epochs,A warning is displayed :

RuntimeWarning: overflow encountered in exp.The resultant output list appears as:

[  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   4.11701866e-10]  for every input except 4.11701866e-10.

sigmoid function:

def sigmoid(x):    
   output = 1/(1+np.exp(-x))
   return output

def sigmoid_output_to_derivative(output):
   return output*(1-output)

How to fix this?Can i use different activation function?

Here is my full code: https://gist.github.com/coding37/a5705142fe1943b93a8cef4988b3ba5f

It is not that easy to give a precise answer, since the problems can be manifold and is very hard to reconstruct, but I'll give it a try:

So it seems you're experiencing underflow, meaning that the weights of your neurons scale your input vector x to values that will lead to zero values in the sigmoid function. A naive suggestion would be to increase the precision from float32 to float64, but I guess you are already at that precision.

Have you played around with the learning rate and/or tried an adaptive learning rate? (See https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1 for some examples). Try more iterations with a lower learning rate for a start maybe.

Also: Are you using sigmoid functions in your output-layer? The added non-linearity could drive your neurons into saturation, ie your problem.

Have you checked your gradients? This can also sometimes help in tracking down errors ( http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization ).

Alternatively, you could try if your training improves by using other activation functions, eg linear for a start.

Since probabilities in machine learning tend to be very small and computations on them lead to even smaller values (leading to underflow errors), it is good practice to do your computations with logarithmic values .

Using float64 types isn't bad but will also fail eventually.

So instead of ie multiplying two small probabilities you should add their logarithmic values. Same goes for other operations like exp().

Every machine learning framework I know either returns logarithmic model params by default or has a method for that. Or you just use your built in math functions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM