在独立群集上提交Spark应用程序

Question

I am rather new at using Spark and I am having issues running a simple word count application on a standalone cluster. 我是使用Spark的新手，我在独立群集上运行简单的字数统计应用程序时遇到问题。 I have a cluster consisting of one master node and one worker, launched on AWS using the spark-ec2 script. 我有一个由一个主节点和一个工作组成的集群，使用spark-ec2脚本在AWS上启动。 Everything works fine when I run the code locally using ./bin/spark-submit --class com.spark.SparkDataAnalysis --master local[*] ./uber-ingestion-0.0.1-SNAPSHOT.jar file:///root/textfile.txt s3n://bucket/wordcount 当我使用./bin/spark-submit --class com.spark.SparkDataAnalysis --master local[*] ./uber-ingestion-0.0.1-SNAPSHOT.jar file:///root/textfile.txt s3n://bucket/wordcount在本地运行代码时一切正常./bin/spark-submit --class com.spark.SparkDataAnalysis --master local[*] ./uber-ingestion-0.0.1-SNAPSHOT.jar file:///root/textfile.txt s3n://bucket/wordcount

This saves the output into the specified directory as it should. 这会将输出保存到指定的目录中。

When I try to run the application using ./bin/spark-submit --class com.spark.SparkDataAnalysis --master spark://server-ip:7077 ./uber-ingestion-0.0.1-SNAPSHOT.jar file:///root/textfile.txt s3n://bucket/wordcount 当我尝试使用./bin/spark-submit --class com.spark.SparkDataAnalysis --master spark://server-ip:7077 ./uber-ingestion-0.0.1-SNAPSHOT.jar file:///root/textfile.txt s3n://bucket/wordcount运行应用程序时./bin/spark-submit --class com.spark.SparkDataAnalysis --master spark://server-ip:7077 ./uber-ingestion-0.0.1-SNAPSHOT.jar file:///root/textfile.txt s3n://bucket/wordcount

it just keeps on running and never produce a final result. 它只是继续运行，永远不会产生最终结果。 The directory gets created but only a temporary file of 0 bytes is present. 目录已创建，但只存在0字节的临时文件。

According to the Spark UI it keeps on running the mapToPair function indefinitely. 根据Spark UI，它会无限期地继续运行mapToPair函数。 Here is a picture of the Spark UI 这是Spark UI的图片

Does anyone know why this is happening and how to solve it? 有谁知道为什么会这样，以及如何解决它？

Here is the code: 这是代码：

public class SparkDataAnalysis {
    public static void main(String args[]){
        SparkConf conf = new SparkConf().setAppName("SparkDataAnalysis");
        JavaSparkContext sc = new JavaSparkContext(conf);

        JavaRDD<String> input = sc.textFile( args[0] );

        JavaRDD<String> words = input.flatMap( s -> Arrays.asList( s.split( " " ) ) );

        JavaPairRDD<String, Integer> counts = words.mapToPair( t -> new Tuple2<String, Integer>( t, 1 ) ).reduceByKey( (x, y) -> x + y );

        counts.saveAsTextFile( args[1] );
    }
}

Answer 1

I skipped using a standalone cluster via the spark-ec2 script and used Amazon EMR instead. 我通过spark-ec2脚本跳过使用独立群集，而是使用Amazon EMR。 There everything worked perfectly. 一切都很完美。

在独立群集上提交Spark应用程序

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-06-15 10:10:27

在独立群集上提交Spark应用程序

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-06-15 10:10:27

解决方案1
0 已采纳 2016-06-15 10:10:27