![](/img/trans.png)
[英]got java.lang.NoClassDefFoundError: scala/Serializable with scala jars in classpath
[英]Test Spark Scala with Maven Got Error: java.lang.NoClassDefFoundError
我嘗試使用Maven在Scala IDE(eclipse)上測試Spark Scala,但始終出現錯誤:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:73)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:68)
at org.apache.spark.SparkConf.<init>(SparkConf.scala:55)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:904)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at com.SimpleApp$.main(SimpleApp.scala:7)
at com.SimpleApp.main(SimpleApp.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 9 more
我嘗試的程序是Spark文檔中的快速入門代碼:
import org.apache.spark.sql.SparkSession
object SimpleApp {
def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
val logData = spark.read.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
spark.stop()
}
}
我使用Spark 2.2.0和Scala 2.11.7。 pom.xml文件是:
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.0</version>
</dependency>
我遵循了另一個線程的解決方案: 執行spark-shell時出現NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream
但這對我不起作用。 我的spark-env.sh文件中的內容是:
# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/local/hadoop/etc/hadoop classpath)
有人可以幫我嗎? 感謝您的幫助。
Devesh的答案解決了部分問題。 但是,我還有其他問題 :
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/08/17 10:34:03 INFO SparkContext: Running Spark version 2.2.0
18/08/17 10:34:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/17 10:34:03 WARN Utils: Your hostname, toshiba0 resolves to a loopback address: 127.0.1.1; using 192.168.1.217 instead (on interface wlp2s0)
18/08/17 10:34:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/08/17 10:34:03 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at com.SimpleApp$.main(SimpleApp.scala:11)
at com.SimpleApp.main(SimpleApp.scala)
18/08/17 10:34:03 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at com.SimpleApp$.main(SimpleApp.scala:11)
at com.SimpleApp.main(SimpleApp.scala)
我不知道為什么Spark會說我的回送地址是127.0.1.1,我檢查了配置:/ etc / network / interfaces,它是自動回送,並且我ping 127.0.0.1。 有用。
我通過此鏈接關注解決方案初始化SparkContext錯誤:必須在配置中設置主URL
並輸入以下代碼,因為我使用筆記本電腦。 它仍然不起作用。
val conf = new SparkConf().setMaster("local[2]")
不知道我的設定會怎樣。 謝謝!
只需在maven pom.xml文件中添加以下內容
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.0</version>
</dependency>
在以前的Spark版本中,您必須創建一個SparkConf和SparkContext才能與Spark交互,而在Spark 2.0及更高版本中,可以通過SparkSession實現相同的效果,而無需顯式創建SparkConf,SparkContext或SQLContext,因為它們被封裝在SparkSession中
**示例代碼段:-**
import org.apache.spark.sql.SparkSession
object SimpleApp {
def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md" // some file on system
val spark = SparkSession
.builder
.appName("Simple Application")
.master("local[2]")
.getOrCreate()
val logData = spark.read.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.