[英]Spark SQL toDF method fails with java.lang.NoSuchMethodError
Understand the cause and the solution to the problem. 了解问题的原因和解决方案。 The problem happens when using spark-submit.
使用spark-submit时会出现问题。 Appreciate the help.
感谢帮助。
spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar
It does not cause an error when running line by line in a spark-shell. 在spark-shell中逐行运行时不会导致错误。
...
scala> val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
|-- aucid: string (nullable = true)
|-- bid: float (nullable = false)
|-- bidtime: float (nullable = false)
|-- bidder: string (nullable = true)
|-- bidrate: integer (nullable = false)
|-- openbid: float (nullable = false)
|-- price: float (nullable = false)
|-- itemtype: string (nullable = true)
|-- dtl: integer (nullable = false)
Calling toDF method to convert RDD into DataFrame causes the error. 调用toDF方法将RDD转换为DataFrame会导致错误。
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
at AuctionDataFrame.main(AuctionDataFrame.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
case class Auctions(
aucid: String,
bid: Float,
bidtime: Float,
bidder: String,
bidrate: Int,
openbid: Float,
price: Float,
itemtype: String,
dtl: Int)
object AuctionDataFrame {
val AUCID = 0
val BID = 1
val BIDTIME = 2
val BIDDER = 3
val BIDRATE = 4
val OPENBID = 5
val PRICE = 6
val ITEMTYPE = 7
val DTL = 8
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("AuctionDataFrame")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
val auctionsRDD = inputRDD.map(a =>
Auctions(
a(AUCID),
a(BID).toFloat,
a(BIDTIME).toFloat,
a(BIDDER),
a(BIDRATE).toInt,
a(OPENBID).toFloat,
a(PRICE).toFloat,
a(ITEMTYPE),
a(DTL).toInt))
val auctionsDF = auctionsRDD.toDF() // <--- line 52 causing the error.
}
build.sbt build.sbt
name := "Auction Project"
version := "1.0"
scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"
/*
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2",
"org.apache.spark" %% "spark-sql" % "1.6.2",
"org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
Spark on Ubuntu 14.04: 在Ubuntu 14.04上Spark:
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
sbt on Windows: Windows上的sbt:
D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12
Looked into similar issues which suggest Scala version incompatibility that compiled Spark. 看看类似的问题,这些问题表明编译Spark的Scala版本不兼容。
Hence changed the Scala version in build.sbt to 2.10 which created 2.10 jar, but the error persisted. 因此将build.sbt中的Scala版本更改为2.10,创建了2.10 jar,但错误仍然存在。 Using % provided or not does not change the error.
使用%提供与否不会更改错误。
scalaVersion := "2.10.6"
The Spark 1.6.2 was compiled from source files with Scala 2.11. Spark 1.6.2是使用Scala 2.11从源文件编译的。 However the spark-1.6.2-bin-without-hadoop.tgz was downloaded and placed in lib/ directory.
然而,下载了spark-1.6.2-bin-without-hadoop.tgz并将其放在lib /目录中。
I believe because the spark-1.6.2-bin-without-hadoop.tgz has been compiled with Scala 2.10, it cause the compatibility issue. 我相信因为spark-1.6.2-bin-without-hadoop.tgz已经使用Scala 2.10编译,它会导致兼容性问题。
Remove the spark-1.6.2-bin-without-hadoop.tgz from the lib directory and run "sbt package" with library dependencies below. 从lib目录中删除spark-1.6.2-bin-without-hadoop.tgz,并在下面运行带有库依赖项的“sbt package”。
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.