Currently I am compiling my JAR file, but I am unable to run the JAR because the error in the title occurs. I am using sbt assembly
in order to compile, so that all dependencies are included.
Scala 2.11.12 Spark 2.4.2
package com.foo.bar
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import Array._
object DebugApp extends App {
override def main(args: Array[String]) {
if (args.length == 0) {
println("Must pass in args: sparkMaster, dataPath")
}
val sparkMaster = args(0)
val bucket = args(1)
val dataPath = args(2)
val parsedDestionationPath = args(3)
val rawDestionationPath = args(4)
val spark = SparkSession
.builder()
.config("spark.driver.extraJavaOptions", "-Dlog4jspark.root.logger=WARN,console")
.appName("Parser")
.master(sparkMaster)
.getOrCreate()
}
}
The first 2 lines of the error show this is coming from Spark:
Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.mutable.Buffer$.empty()Lscala/collection/GenTraversable; at org.apache.spark.sql.SparkSessionExtensions.(SparkSessionExtensions.scala:72)
Further context is the dependencies I am using in build.sbt
:
scalaVersion in ThisBuild := "2.11.12"
fork in run := true
...
val sparkV = "2.4.2"
val spark = "org.apache.spark" %% "spark-core" % sparkV
val sparkSql = "org.apache.spark" %% "spark-sql" % sparkV
val sparkHive = "org.apache.spark" %% "spark-hive" % sparkV
Simply put, if you want to run Spark locally (v2.4.x), you must specify Hadoop as 2.6.5. You can use any version of the AWS Java SDK, but Hadoop is specifically locked to that version. If you wish to circumvent this, it would be wise to upload files to S3 in one of 2 ways:
TransferManager
org.apache.httpcomponents.httpclient
4.5.8, which can cause logs to flood aws s3 sync
<-- recommended
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.