简体   繁体   中英

In Apache Spark's Scala API, is there a difference between using one single quote and $“” notation?

I have noticed two different notation styles when referencing columns (in this case in a select statement). Is there a functional difference between the two?

val df = spark.read.table("mytable").select('column1,'column2)

vs.

val df = spark.read.table("mytable").select($"column1",$"column2")

I haven't been able to find anything that really explains the difference or if there is a standard.

When using any of 'column1,'column2 or $"column1",$"column2" , the returned value is going to be ColumnName(column1), ColumnName(column2) which happens to be a sub class of Column() that is one of the expected types to select . However, their implementations are different.

In order to use the symbols, an import import spark.implicits._ to be included in the application where spark is a SparkSession object. The import ensures that following implicits are available and in-scope.

From Spark code,

   @Experimental
      object implicits extends SQLImplicits with Serializable {
        protected override def _sqlContext: SQLContext = SparkSession.this.sqlContext
      }

spark.implicits extends SQLImplicits

package org.apache.spark.sql

abstract class SQLImplicits extends LowPrioritySQLImplicits {

   ...

  /**
   * Converts $"col name" into a [[Column]].
   *
   * @since 2.0.0
   */
  implicit class StringToColumn(val sc: StringContext) {
    def $(args: Any*): ColumnName = {
      new ColumnName(sc.s(args: _*))
    }
  }

  ...

   /**
   * An implicit conversion that turns a Scala `Symbol` into a [[Column]].
   * @since 1.3.0
   */
  implicit def symbolToColumn(s: Symbol): ColumnName = new ColumnName(s.name)

}  

When using $column1 , the $ method in implicit class StringToColumn is invoked that converts a String to a ColumnName instance.

' is a scala symbol. In order to use it import spark.implicits._ is not required. however, to a convert a Scala symbol to a Column , the import is needed. When using this symbol, the implicit method symbolToColumn will get executed and returns a ColumnName instance. Note that 'column1 is same as Symbol("column1") in Scala.

From org.apache.spark.sql.Column.scala , ColumnName is a subclass of Column . so the returned objects from $ and ' can be used in DataFrame/Dataset select methods.

/**
 * A convenient class used for constructing schema.
 *
 * @since 1.3.0
 */
@InterfaceStability.Stable
class ColumnName(name: String) extends Column(name) {
    ...
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM