简体   繁体   English

惰性 val function 与 def 方法

[英]lazy val function vs def method

When calling to a function from external class, in case of many calls, what will give me a better performance, lazy val function or def method?从外部 class 调用 function 时,如果有很多调用,什么会给我更好的性能, lazy val function 或def方法? So far, what I understood is:到目前为止,我的理解是:

def method- def方法-

  1. Defined and tied to a class, needed to be declare inside "object" in order to be called as java static style.定义并绑定到 class,需要在“对象”中声明才能被称为 java static 样式。
  2. Call-by-name, evaluated only when accessed, and every accessed.按名称调用,仅在访问时评估,每次访问时评估。

lazy val lambda expression - lazy val lambda 表达式 -

  1. Tied to object Function1/2...22绑定到 object Function1/2...22
  2. Call-by-value, evaluated the first time get accessed and evaluated only one time.按值调用,第一次评估仅访问和评估一次。
  3. Is actually def apply method tied to a class.实际上 def apply 方法绑定到 class。

So, it may seem that using lazy val will reduce the need to evaluate the function every time, should it be preferred?所以,似乎使用 lazy val 会减少每次评估 function 的需要,它应该是首选吗?

I faced that when i'm producing UDF for Spark code, and i'm trying to understand which approach is better.当我为 Spark 代码生成 UDF 时,我遇到了这个问题,我试图了解哪种方法更好。

object sql {
  def emptyStringToNull(str: String): Option[String] = {
    Option(str).getOrElse("").trim match {
      case "" => None
      case "[]" => None
      case "null" => None
      case _ => Some(str.trim)
    }
  }

  def udfEmptyStringToNull: UserDefinedFunction = udf(emptyStringToNull _)

  def repairColumn_method(dataFrame: DataFrame, colName: String): DataFrame = {
    dataFrame.withColumn(colName, udfEmptyStringToNull(col(colName)))
  }

  lazy val repairColumn_fun: (DataFrame, String) => DataFrame = { (df,colName) =>
    df.withColumn(colName, udfEmptyStringToNull(col(colName)))
  }
}

There's no need for you to use a lazy val in this specific case.在这种特定情况下,您无需使用lazy val When you assign a function to a lazy val , its results are not memoized, as you seem to think they are.当您将 function 分配给lazy val时,它的结果不会像您认为的那样被记忆。 Since the function itself is a plain function literal and not the result of an expensive computation (regardless of what goes on inside it), making it lazy is not useful.由于 function 本身是一个普通的 function 字面量,而不是昂贵计算的结果(不管它内部发生了什么),所以让它变得懒惰是没有用的。 All it does is add overhead when accessing and calling it.它所做的只是在访问和调用它时增加开销。 A simple val would be better, but making it a proper method would be best.一个简单的val会更好,但最好让它成为一个合适的方法。

If you want memoization, see Is there a generic way to memoize in Scala?如果你想要记忆,请参阅Scala 中是否有通用的记忆方法? instead.反而。

Ignoring your specific example, if the def in question didn't take any arguments and both it and the lazy val were simple values that were expensive to compute, I would go with the lazy val if you're going to call it many times to avoid computing it over and over again.忽略你的具体例子,如果有问题的def没有采用任何 arguments 并且它和lazy val都是简单的值,计算起来很昂贵,如果你要多次调用它,我会用 go 和lazy val避免一遍又一遍地计算它。

If they were values that were very cheap to compute and you're not going to call it many times, or if they're expensive to compute but you're only going to call them once, I would go with a def instead.如果它们是计算成本非常低的值并且您不会多次调用它,或者如果它们的计算成本很高但您只打算调用它们一次,那么我会用def代替 go。 There wouldn't be much difference if you used a lazy val instead, but it would avoid making a couple of fields.如果您改用lazy val ,则不会有太大区别,但它会避免创建几个字段。

If they're somewhat cheap to compute but they're being called many times, it may be better to use a lazy val simply because they'll be cached.如果它们的计算成本有点低,但它们被调用了很多次,那么使用lazy val可能会更好,因为它们会被缓存。 However, you might want to look at your overall design before looking at such micro-optimizations.但是,在查看此类微优化之前,您可能希望查看整体设计。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM