简体   繁体   中英

How is logistic regression parallelized in Spark?

我想对ML库中用于并行化逻辑回归的方法有一些了解,我已经尝试检查源代码,但是我不理解该过程。

Spark uses a so called mini batch gradient descent for regression:

http://ruder.io/optimizing-gradient-descent/index.html#minibatchgradientdescent

In a nutshell, it works like this:

  1. Select a sample of the data
  2. Compute the gradient on each row of the sample
  3. Aggregate the gradient
  4. Back to step 1

The actual optimisation code for Spark is from this line: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L234

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM