簡體   English   中英

使用Maven測試Spark Scala錯誤:java.lang.NoClassDefFoundError

[英]Test Spark Scala with Maven Got Error: java.lang.NoClassDefFoundError

我嘗試使用Maven在Scala IDE(eclipse)上測試Spark Scala,但始終出現錯誤:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
    at org.apache.spark.SparkConf.loadFromSystemProperties(SparkConf.scala:73)
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:68)
    at org.apache.spark.SparkConf.<init>(SparkConf.scala:55)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:904)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
    at com.SimpleApp$.main(SimpleApp.scala:7)
    at com.SimpleApp.main(SimpleApp.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 9 more

我嘗試的程序是Spark文檔中的快速入門代碼:

import org.apache.spark.sql.SparkSession

object SimpleApp {

  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file on your system
    val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
    val logData = spark.read.textFile(logFile).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    spark.stop()
  }
}

我使用Spark 2.2.0和Scala 2.11.7。 pom.xml文件是:

<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.2.0</version>
    </dependency>       

我遵循了另一個線程的解決方案: 執行spark-shell時出現NoClassDefFoundError com.apache.hadoop.fs.FSDataInputStream

但這對我不起作用。 我的spark-env.sh文件中的內容是:

# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)

# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /usr/local/hadoop/etc/hadoop classpath)

有人可以幫我嗎? 感謝您的幫助。

Devesh的答案解決了部分問題。 但是,我還有其他問題

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/08/17 10:34:03 INFO SparkContext: Running Spark version 2.2.0
18/08/17 10:34:03 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/17 10:34:03 WARN Utils: Your hostname, toshiba0 resolves to a loopback address: 127.0.1.1; using 192.168.1.217 instead (on interface wlp2s0)
18/08/17 10:34:03 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
18/08/17 10:34:03 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
    at com.SimpleApp$.main(SimpleApp.scala:11)
    at com.SimpleApp.main(SimpleApp.scala)
18/08/17 10:34:03 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:376)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
    at com.SimpleApp$.main(SimpleApp.scala:11)
    at com.SimpleApp.main(SimpleApp.scala)

我不知道為什么Spark會說我的回送地址是127.0.1.1,我檢查了配置:/ etc / network / interfaces,它是自動回送,並且我ping 127.0.0.1。 有用。

我通過此鏈接關注解決方案初始化SparkContext錯誤:必須在配置中設置主URL

並輸入以下代碼,因為我使用筆記本電腦。 它仍然不起作用。

val conf = new SparkConf().setMaster("local[2]")

不知道我的設定會怎樣。 謝謝!

只需在maven pom.xml文件中添加以下內容

<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
     <groupId>org.apache.hadoop</groupId>
     <artifactId>hadoop-client</artifactId>
     <version>2.7.0</version>
</dependency>

在以前的Spark版本中,您必須創建一個SparkConf和SparkContext才能與Spark交互,而在Spark 2.0及更高版本中,可以通過SparkSession實現相同的效果,而無需顯式創建SparkConf,SparkContext或SQLContext,因為它們被封裝在SparkSession中

**示例代碼段:-**

import org.apache.spark.sql.SparkSession
object SimpleApp {

def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md" // some file on system
val spark = SparkSession
            .builder
            .appName("Simple Application")
            .master("local[2]")
            .getOrCreate()
val logData = spark.read.textFile(logFile).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println(s"Lines with a: $numAs, Lines with b: $numBs")
 }
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM