简体   繁体   English

Logistic回归如何在Spark中并行化?

[英]How is logistic regression parallelized in Spark?

我想对ML库中用于并行化逻辑回归的方法有一些了解,我已经尝试检查源代码,但是我不理解该过程。

Spark uses a so called mini batch gradient descent for regression: Spark使用所谓的迷你批处理梯度下降进行回归:

http://ruder.io/optimizing-gradient-descent/index.html#minibatchgradientdescent http://ruder.io/optimizing-gradient-descent/index.html#minibatchgradientdescent

In a nutshell, it works like this: 简而言之,它的工作方式如下:

  1. Select a sample of the data 选择数据样本
  2. Compute the gradient on each row of the sample 计算样本每一行的梯度
  3. Aggregate the gradient 汇总渐变
  4. Back to step 1 返回步骤1

The actual optimisation code for Spark is from this line: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L234 Spark的实际优化代码来自以下行: https : //github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala#L234

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM