[英]Checkpoint with spark file streaming in java
I want to implement checkpoint with spark file streaming application to process all unprocessed files from hadoop if in any case my spark streaming application stop/terminates.我想用 spark 文件流应用程序实现检查点,以处理来自 hadoop 的所有未处理文件,如果在任何情况下我的 spark 流应用程序停止/终止。 I am following this: streaming programming guide , but not found JavaStreamingContextFactory.
我正在关注: 流式编程指南,但未找到 JavaStreamingContextFactory。 Please help me what should I do.
请帮帮我,我该怎么做。
My Code is我的代码是
public class StartAppWithCheckPoint {
public static void main(String[] args) {
try {
String filePath = "hdfs://Master:9000/mmi_traffic/listenerTransaction/2020/*/*/*/";
String checkpointDirectory = "hdfs://Mongo1:9000/probeAnalysis/checkpoint";
SparkSession sparkSession = JavaSparkSessionSingleton.getInstance();
JavaStreamingContextFactory contextFactory = new JavaStreamingContextFactory() {
@Override public JavaStreamingContext create() {
SparkConf sparkConf = new SparkConf().setAppName("ProbeAnalysis");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
JavaStreamingContext jssc = new JavaStreamingContext(sc, Durations.seconds(300));
JavaDStream<String> lines = jssc.textFileStream(filePath).cache();
jssc.checkpoint(checkpointDirectory);
return jssc;
}
};
JavaStreamingContext context = JavaStreamingContext.getOrCreate(checkpointDirectory, contextFactory);
context.start();
context.awaitTermination();
context.close();
sparkSession.close();
} catch(Exception e) {
e.printStackTrace();
}
}
}
You must use Checkpointing您必须使用检查点
For checkpointing use stateful transformations either updateStateByKey
or reduceByKeyAndWindow
.对于检查点,使用有状态转换
updateStateByKey
或reduceByKeyAndWindow
。 There are a plenty of examples in spark-examples provided along with prebuild spark and spark source in git-hub.在 git-hub 中提供的spark-examples中有大量示例以及预构建 spark 和 spark 源。 For your specific, see JavaStatefulNetworkWordCount.java ;
具体请参见JavaStatefulNetworkWordCount.java ;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.