简体   繁体   中英

lazy val function vs def method

When calling to a function from external class, in case of many calls, what will give me a better performance, lazy val function or def method? So far, what I understood is:

def method-

  1. Defined and tied to a class, needed to be declare inside "object" in order to be called as java static style.
  2. Call-by-name, evaluated only when accessed, and every accessed.

lazy val lambda expression -

  1. Tied to object Function1/2...22
  2. Call-by-value, evaluated the first time get accessed and evaluated only one time.
  3. Is actually def apply method tied to a class.

So, it may seem that using lazy val will reduce the need to evaluate the function every time, should it be preferred?

I faced that when i'm producing UDF for Spark code, and i'm trying to understand which approach is better.

object sql {
  def emptyStringToNull(str: String): Option[String] = {
    Option(str).getOrElse("").trim match {
      case "" => None
      case "[]" => None
      case "null" => None
      case _ => Some(str.trim)
    }
  }

  def udfEmptyStringToNull: UserDefinedFunction = udf(emptyStringToNull _)

  def repairColumn_method(dataFrame: DataFrame, colName: String): DataFrame = {
    dataFrame.withColumn(colName, udfEmptyStringToNull(col(colName)))
  }

  lazy val repairColumn_fun: (DataFrame, String) => DataFrame = { (df,colName) =>
    df.withColumn(colName, udfEmptyStringToNull(col(colName)))
  }
}

There's no need for you to use a lazy val in this specific case. When you assign a function to a lazy val , its results are not memoized, as you seem to think they are. Since the function itself is a plain function literal and not the result of an expensive computation (regardless of what goes on inside it), making it lazy is not useful. All it does is add overhead when accessing and calling it. A simple val would be better, but making it a proper method would be best.

If you want memoization, see Is there a generic way to memoize in Scala? instead.

Ignoring your specific example, if the def in question didn't take any arguments and both it and the lazy val were simple values that were expensive to compute, I would go with the lazy val if you're going to call it many times to avoid computing it over and over again.

If they were values that were very cheap to compute and you're not going to call it many times, or if they're expensive to compute but you're only going to call them once, I would go with a def instead. There wouldn't be much difference if you used a lazy val instead, but it would avoid making a couple of fields.

If they're somewhat cheap to compute but they're being called many times, it may be better to use a lazy val simply because they'll be cached. However, you might want to look at your overall design before looking at such micro-optimizations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM