简体   繁体   中英

How to optimize SciKit one-class training time?

Essentially my questions is the same as SciKit One-class SVM classifier training time increases exponentially with size of training data , but no one has figured out the problem.

It seems to run fine for somewhere in the 10s of thousands, but 100s of thousands take very long. And I want to run it on 10s of millions, but I don't want to wait a day and a half (maybe even more) for nothing to come of it. Is there a faster way about it, or should I use something else?

I'm very junior in this field, so take this with a grain of salt.

Isolation Forests appear to be an efficient solution for outlier detection. They have been shown to perform well against other popular algorithms [Liu, 2008]. Also, One-class SVMs are somewhat susceptible to anomalies according to scikit learn. The anomalies in your Class 1 could overlap with Class 2 and cause data to be mislabeled... perhaps taking subsets of your samples and using them to create an ensemble of SVMs could avoid this (and still save you time, depending on the size of the subsets), but Isolation Forests naturally do this.

For further reading, this seems like a good reference paper on the topic http://www.robots.ox.ac.uk/~davidc/pubs/NDreview2014.pdf

It mentions clustering and distance methods which may be applicable in your case. I think it's best to do a lot of reading and make sure you understand the different strengths/weaknesses of the algorithms. Especially since I'm in the process of doing that and really can't give solid advice even if I knew the specifics of your problem.

Note re:distance based algorithms. I know some are optimized, but I think the general complaint is that they have high computation complexity. Many clustering/distance/probability based algorithms also have weaknesses dealing with high dimensionality data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM