简体   繁体   中英

loss function as min of several points, custom loss function and gradient

I am trying to predict quality of metal coil. I have the metal coil with width 10 meters and length from 1 to 6 kilometers. As training data I have ~600 parameters measured each 10 meters, and final quality control mark - good/bad (for whole coil). Bad means there is at least 1 place there is coil is bad, there is no data where is exactly. I have data for approx 10000 coils.

Lets imagine we want to train logistic regression for this data(with 2 factors).

X = [[0, 0],
      ...
     [0, 0],
     [1, 1], # coil is actually broken here, but we don't know it yet.
     [0, 0],
      ...
     [0, 0]]

Y = ?????

I can't just put all "bad" in Y and run classifier, because I will be confusing for classifier. I can't put all "good" and one "bad" in Y becuase I don't know where is the bad position.

The solution I have in mind is the following, I could define loss function as sum( (Y-min(F(x1,x2)))^2 ) (min calculated by all F belonging to one coil) not sum( (YF(x1,x2))^2 ) . In this case probably I get F trained correctly to point bad place. I need gradient for that, it there is impossible to calculate it in all points, the min is not differentiable in all places, but I could use weak gradient instead(using values of functions which is minimal in coil in every place).

I more or less know how to implement it myself, the question is what is the simplest way to do it in python with scikit-learn. Ideally it should be same (or easily adaptable) with several learning method(a lot of methods based on loss function and gradient), is where possible to make some wrapper for learning methods which works this way?

update : looking at gradient_boosting.py - there is internal abstract class LossFunction with ability to calculate loss and gradient, looks perspective. Looks like there is no common solution.

What you are considering here is known in machine learning community as superset learning , meaning, that instead of typical supervised setting where you have training set in the form of {(x_i, y_i)} you have {({x_1, ..., x_N}, y_1)} such that you know that at least one element from the set has property y_1. This is not a very common setting, but existing, with some research available, google for papers in the domain.

In terms of your own loss functions - scikit-learn is a no-go. Scikit-learn is about simplicity, it provides you with a small set of ready to use tools with very little flexibility. It is not a research tool, and your problem is researchy. What can you use instead? I suggest you go for any symbolic-differentiation solution, for example autograd which gives you ability to differentiate through python code , simply apply scipy.optimize.minimize on top of it and you are done! Any custom loss function will work just fine.

As a side note - minimum operator is not differentiable, thus the model might have hard time figuring out what is going on. You could instead try to do sum((Y - prod_x F(x_1, x_2) )^2) since multiplication is nicely differentiable, and you will still get the similar effect - if at least one element is predicted to be 0 it will remove any "1" answer from the remaining ones. You can even go one step further to make it more numerically stable and do:

if Y==0 then loss = sum_x log(F(x_1, x_2 ) )
if Y==1 then loss = sum_x log(1-F(x_1, x_2))

which translates to

Y * sum_x log(1-F(x_1, x_2)) + (1-Y) * sum_x log( F(x_1, x_2) )

you can notice similarity with cross entropy cost which makes perfect sense since your problem is indeed a classification . And now you have perfect probabilistic loss - you are attaching such probabilities of each segment to be "bad" or "good" so the probability of the whole object being bad is either high (if Y==0) or low (if Y==1).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM