[英]Spark job works when running locally but not working when on standalone mode
我有一個簡單的Spark代碼,在本地運行時可以正常工作,但是當我嘗試將Spark Standalone Cluster與Docker一起運行時,它奇怪地失敗了。
我可以確認與主服務器和工人的集成正在運行。
在下面的代碼中,我顯示了錯誤發生的地方。
JavaRDD<Row> rddwithoutMap = dataFrame.javaRDD();
JavaRDD<Row> rddwithMap = dataFrame.javaRDD()
.map((Function<Row, Row>) row -> row);
long count = rddwithoutMap.count(); //here is fine
long countBeforeMap = rddwithMap.count(); // here I get the error
地圖之后,我無法調用任何Spark動作。
Caused by: java.lang.ClassNotFoundException: com.apssouza.lambda.MyApp$1
的錯誤Caused by: java.lang.ClassNotFoundException: com.apssouza.lambda.MyApp$1
Obs:我在地圖上使用Lambda,以使代碼更具可讀性,但在使用獨立版本時,我也無法使用lambda。 Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.fun$1 of type org.apache.spark.api.java.function.Function in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1
Docker鏡像: bde2020/spark-master:2.3.2-hadoop2.7
本地Spark版本:2.4.0
Spark依賴版本:spark-core_2.112.3.2
public class MyApp {
public static void main(String[] args) throws IOException, URISyntaxException {
// String sparkMasterUrl = "local[*]";
// String csvFile = "/Users/apssouza/Projetos/java/lambda-arch/data/spark/input/localhost.csv";
String sparkMasterUrl = "spark://spark-master:7077";
String csvFile = "hdfs://namenode:8020/user/lambda/localhost.csv";
SparkConf sparkConf = new SparkConf()
.setAppName("Lambda-demo")
.setMaster(sparkMasterUrl);
// .setJars(/path/to/my/jar); I even tried to set the jar
JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);
SQLContext sqlContext = new SQLContext(sparkContext);
Dataset<Row> dataFrame = sqlContext.read()
.format("csv")
.option("header", "true")
.load(csvFile);
JavaRDD<Row> rddwithoutMap = dataFrame.javaRDD();
JavaRDD<Row> rddwithMap = dataFrame.javaRDD()
.map((Function<Row, Row>) row -> row);
long count = rddwithoutMap.count();
long countBeforeMap = rddwithMap.count();
}
}
<?xml version="1.0" encoding="UTF-8"?>
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>com.apssouza.lambda</groupId>
<artifactId>lambda-arch</artifactId>
<version>1.0-SNAPSHOT</version>
<name>lambda-arch</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.9.7</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.6</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.11</artifactId>
<version>2.9.7</version>
</dependency>
</dependencies>
</project>
Obs:如果取消注釋前兩行,則一切正常。
問題是因為我在運行程序之前沒有打包程序,並且在Spark集群中得到的應用程序版本過舊。 這很奇怪,因為我正在通過我的IDE(IntelliJ)運行它,並且應該在運行它之前包裝jar。 無論如何,點擊運行按鈕之前的mvn package
解決了該問題。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.