简体   繁体   中英

How to make use of Delta Lake on a regular Scala project on IDE

I've added the delta dependencies in my build.sbt

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % sparkVersion,
  "org.apache.spark" %% "spark-sql" % sparkVersion,
  "org.apache.spark" %% "spark-hive" % sparkVersion,
  // logging
  "org.apache.logging.log4j" % "log4j-api" % "2.4.1",
  "org.apache.logging.log4j" % "log4j-core" % "2.4.1",
  // postgres for DB connectivity
  "org.postgresql" % "postgresql" % postgresVersion,
  "io.delta" %% "delta-core" % "0.7.0"

However, I cannot figure out what configuration must the spark session contain. The code below fails.

val spark = SparkSession.builder()
    .appName("Spark SQL Practice")
    .config("spark.master", "local")
    .config("spark.network.timeout"  , "10000000s")//to avoid Heartbeat exception
    .config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
    .config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
    .getOrCreate()

Exception -

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable

Here's an example project I made that'll help you.

The build.sbt file should include these dependencies:

libraryDependencies += "org.apache.spark" %% "spark-sql" % "3.0.0" % "provided"
libraryDependencies += "io.delta" %% "delta-core" % "0.7.0" % "provided"

I think you need to be using Spark 3 for Delta Lake 0.7.0 .

You shouldn't need any special SparkSession config options, something like this should be fine:

lazy val spark: SparkSession = {
  SparkSession
    .builder()
    .master("local")
    .appName("spark session")
    .config("spark.databricks.delta.retentionDurationCheck.enabled", "false")
    .getOrCreate()
}

This is caused when there is a class file that your code depends on and it is present at compile time but not found at runtime. Look for differences in your build time and runtime classpaths.

More specific to your scenario:

If you get  java.lang.NoClassDefFoundError on
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable exception 
in this case JAR version does not have MergeIntoTable.scala file. 
The solution was to add the apache spark latest version, which comes with the
org/apache/spark/sql/catalyst/plans/logical/MergeIntoTable.scala file . 

More info in the spark 3.xx upgrage & release - https://github.com/apache/spark/pull/26167 .

You need to upgrade Apache Spark. MergeIntoTable feature was introduced in version v3.0.0. Link to sources: AstBuilder.scala , Analyzer.scala , Github Pull Request , Release Notes (Look into Feature Enhancements section).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM