简体   繁体   中英

Spark ignoring package jars included in the configuration of my Spark Session

I keep running into a java.lang.ClassNotFoundException: Failed to find data source: iceberg. Please find packages at https://spark.apache.org/third-party-projects.html java.lang.ClassNotFoundException: Failed to find data source: iceberg. Please find packages at https://spark.apache.org/third-party-projects.html error.

I am trying to include the org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.1.0 package as part of my spark code. The reason is that I want it to write unit tests locally. I have tried several things:

  1. Include the package as part of my SparkSession builder:
   val conf = new SparkConf()
   conf.set("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:1.1.0")
   val sparkSession: SparkSession = 
   SparkSession
      .builder()
      .appName(getClass.getSimpleName)
      .config(conf = conf)
   //   ... the rest of my config
      .master("local[*]").getOrCreate()

and it does not work, I get the same error. I also tried directly using the configuration string in the sparksession builder and that didn't work either.

  1. Downloading the jar myself. I really don't want to do this, I want it to be automated. But even this, I cannot specify "spark.jars" to point to the downloaded jar, it cannot find it for some reason.

Can anybody help me figure this out?

You can create a uber/fat jar and put all your dependencies in that jar.

Lets say if you want to use iceberg in your spark application.

Create a pom.xml file and add the dependency in include section.

<dependencies>
    <dependency>
      <groupId>org.apache.iceberg</groupId>
      <artifactId>iceberg-spark-runtime-3.2_2.12</artifactId>
      <version>4.12</version>
    </dependency>
</dependencies>

It will create a fat jar along with that dependency baked in it. you can deploy that jar via spark-submit and the dependent libraries will be picked automatically.

It seems spark.jars.packages is only read when spark-shell starts up. That means it can be changed in the spark-shell session via SparkSession or SparkConf, however, it will not be processed or loaded.

For a Self-Contained Scala Application, you may used to add the following dependencies in the build.sbt:

libraryDependencies ++= Seq(
  "org.mongodb.spark" %% "mongo-spark-connector" % "10.0.5",
  "org.apache.spark" %% "spark-core" % "3.0.2",
  "org.apache.spark" %% "spark-sql" % "3.0.2"
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM