简体   繁体   中英

Importing spark.implicits._ in scala

I am trying to import spark.implicits._ Apparently, this is an object inside a class in scala. when i import it in a method like so:

def f() = {
  val spark = SparkSession()....
  import spark.implicits._
}

It works fine, however i am writing a test class and i want to make this import available for all tests I have tried:

class SomeSpec extends FlatSpec with BeforeAndAfter {
  var spark:SparkSession = _

  //This won't compile
  import spark.implicits._

  before {
    spark = SparkSession()....
    //This won't either
    import spark.implicits._
  }

  "a test" should "run" in {
    //Even this won't compile (although it already looks bad here)
    import spark.implicits._

    //This was the only way i could make it work
    val spark = this.spark
    import spark.implicits._
  }
}

Not only does this look bad, i don't want to do it for every test What is the "correct" way of doing it?

You can do something similar to what is done in the Spark testing suites. For example this would work (inspired by SQLTestData ):

class SomeSpec extends FlatSpec with BeforeAndAfter { self =>

  var spark: SparkSession = _

  private object testImplicits extends SQLImplicits {
    protected override def _sqlContext: SQLContext = self.spark.sqlContext
  }
  import testImplicits._

  before {
    spark = SparkSession.builder().master("local").getOrCreate()
  }

  "a test" should "run" in {
    // implicits are working
    val df = spark.sparkContext.parallelize(List(1,2,3)).toDF()
  }
}

Alternatively you may use something like SharedSQLContext directly, which provides a testImplicits: SQLImplicits , ie:

class SomeSpec extends FlatSpec with SharedSQLContext {
  import testImplicits._

  // ...

}

I think the GitHub code in SparkSession.scala file can give you a good hint:

      /**
       * :: Experimental ::
       * (Scala-specific) Implicit methods available in Scala for converting
       * common Scala objects into [[DataFrame]]s.
       *
       * {{{
       *   val sparkSession = SparkSession.builder.getOrCreate()
       *   import sparkSession.implicits._
       * }}}
       *
       * @since 2.0.0
       */
      @Experimental
      object implicits extends SQLImplicits with Serializable {
        protected override def _sqlContext: SQLContext = SparkSession.this.sqlContext
      }

here "spark" in "spark.implicits._" is just the sparkSession object we created.

Here is another reference!

I just instantiate SparkSession and before to use, "import implicits".

@transient lazy val spark = SparkSession
  .builder()
  .master("spark://master:7777")
  .getOrCreate()

import spark.implicits._

Thanks to @bluenote10 for helpful answer and we can simplify it again, for example without helper object testImplicits :

private object testImplicits extends SQLImplicits {
  protected override def _sqlContext: SQLContext = self.spark.sqlContext
}

with following way:

trait SharedSparkSession extends BeforeAndAfterAll { self: Suite =>

  /**
   * The SparkSession instance to use for all tests in one suite.
   */
  private var spark: SparkSession = _

  /**
   * Returns local running SparkSession instance.
   * @return SparkSession instance `spark`
   */
  protected def sparkSession: SparkSession = spark

  /**
   * A helper implicit value that allows us to import SQL implicits.
   */
  protected lazy val sqlImplicits: SQLImplicits = self.sparkSession.implicits

  /**
   * Starts a new local spark session for tests.
   */
  protected def startSparkSession(): Unit = {
    if (spark == null) {
      spark = SparkSession
        .builder()
        .master("local[2]")
        .appName("Testing Spark Session")
        .getOrCreate()
    }
  }

  /**
   * Stops existing local spark session.
   */
  protected def stopSparkSession(): Unit = {
    if (spark != null) {
      spark.stop()
      spark = null
    }
  }

  /**
   * Runs before all tests and starts spark session.
   */
  override def beforeAll(): Unit = {
    startSparkSession()
    super.beforeAll()
  }

  /**
   * Runs after all tests and stops existing spark session.
   */
  override def afterAll(): Unit = {
    super.afterAll()
    stopSparkSession()
  }
}

and finally we can use SharedSparkSession for unit tests and import sqlImplicits :

class SomeSuite extends FunSuite with SharedSparkSession {
  // We can import sql implicits 
  import sqlImplicits._

  // We can use method sparkSession which returns locally running spark session
  test("some test") {
    val df = sparkSession.sparkContext.parallelize(List(1,2,3)).toDF()
    //...
  }
}

Create a sparksession object and use the spark.implicit._ just before you want to convert any rdd to datasets.

Like this:

val spark = SparkSession
      .builder
      .appName("SparkSQL")
      .master("local[*]")
      .getOrCreate()

import spark.implicits._
val someDataset = someRdd.toDS

It has to do something with using val vs var in scala.

Eg following does not work

var sparkSession = new SparkSession.Builder().appName("my-app").config(sparkConf).getOrCreate
import sparkSession.implicits._

But following does

sparkSession = new SparkSession.Builder().appName("my-app").config(sparkConf).getOrCreate
val sparkSessionConst = sparkSession
import sparkSessionConst.implicits._

I am very familiar with scala so I can only guess that the reasoning is same as why we can only use outer variables declared final inside a closure in java.

Well, I've been re-using existing SparkSession in each called method.. by creating local val inside method -

val spark: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession.active

And then

import spark.implicits._

我知道这是旧帖子,但只是想分享我对此的建议,我认为问题与您声明 sparkSession 的方式有关。当您将 sparkSession 声明为 var 时,它不会使其成为不可变的,这可以在以后的时间点更改。所以它不允许导入隐式,因为它可能会导致歧义,因为后期可以更改它,因为它在 val 的情况下不相同

The issue is naming the variable "spark", which clashes with the name of the spark namespace.

Instead, name the variable something else like sparkSession:

  final private val sparkSession = SparkSession.builder().getOrCreate()
  import sparkSession.implicits._

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM