简体   繁体   English

如何在 Spark (Scala) 中取 RDD 的对数

[英]How to take the logarithm of an RDD in Spark (Scala)

How do I take the logarithm of an RDD?如何取 RDD 的对数? I have a val rdd: RDD[Double] and I simply want to take the logarithm of it.我有一个val rdd: RDD[Double] ,我只是想取它的对数。

This is essentially the same question as this , but the solution proposed does not work.这与this本质上是同一个问题,但所提出的解决方案不起作用。 I run:我跑:

val rdd: RDD[Double] = <something>
val log_y = rdd.map(x => org.apache.commons.math3.analysis.function.Log(2.0, x))

and I get the error:我得到错误:

error: object org.apache.commons.math3.analysis.function.Log is not a value

As I see, this class from math3 computes the natural log function and it works like:如我所见,来自math3 的这个 class计算自然对数 function,它的工作原理如下:

new org.apache.commons.math3.analysis.function.Log().value(3)
res1: Double = 1.0986122886681098

It is the version that comes with the Spark 3.1.2.是Spark 3.1.2自带的版本。

val log_y = rdd.map(x => org.apache.commons.math3.analysis.function.Log().value)

If you are using this same version如果您使用的是同一版本

This is the code of this class:这是这个class的代码:

public class Log implements UnivariateDifferentiableFunction, DifferentiableUnivariateFunction {
    /** {@inheritDoc} */
    public double value(double x) {
        return FastMath.log(x);
    }

    /** {@inheritDoc}
     * @deprecated as of 3.1, replaced by {@link #value(DerivativeStructure)}
     */
    @Deprecated
    public UnivariateFunction derivative() {
        return FunctionUtils.toDifferentiableUnivariateFunction(this).derivative();
    }

    /** {@inheritDoc}
     * @since 3.1
     */
    public DerivativeStructure value(final DerivativeStructure t) {
        return t.log();
    }

}

As you can see, you need to create an instance and then the value property is what actually executes the log function. Although you could create an instance outside the map and mark it as transient to avoid creating a new instance per RDD element.如您所见,您需要创建一个实例,然后value属性实际执行日志 function。尽管您可以在 map 之外创建一个实例并将其标记为瞬态以避免为每个 RDD 元素创建一个新实例。 Or you can use this class in a mapPartition function to create only an instance per partition.或者您可以在 mapPartition function 中使用这个 class 来为每个分区只创建一个实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM