简体   繁体   English

如何检查许多火花流的来源

[英]How to checkpoint many source of spark streaming

I have many CSV spark.readStream in a different locations, I have to checkpoint all of them with scala, I specified a query for every stream but when I run the job, I got this message 我在不同的位置有许多CSV spark.readStream,我必须使用scala检查所有这些点,我为每个流都指定了一个查询,但是当我运行作业时,我收到了此消息

java.lang.IllegalArgumentException: Cannot start query with name "query1" as a query with that name is already active java.lang.IllegalArgumentException:无法启动名称为“ query1”的查询,因为具有该名称的查询已处于活动状态

I solved my problem by creating a many streaming query like this : 我通过创建许多像这样的流查询解决了我的问题:

val spark = SparkSession
    .builder
    .appName("test")
    .config("spark.local", "local[*]")
    .getOrCreate()
spark.sparkContext.setCheckpointDir(path_checkpoint)
val event1 = spark  
.readStream //  
.schema(schema_a)  
.option("header", "true")    
.option("sep", ",")    
.csv(path_a) 

val query = event1.writeStream
  .outputMode("append")
  .format("console")
  .start()
   spark.streams.awaitAnyTermination()
val spark = SparkSession
    .builder
    .appName("test")
    .config("spark.local", "local[*]")
    .getOrCreate()
spark.sparkContext.setCheckpointDir(path_checkpoint)
val event1 = spark  
.readStream //  
.schema(schema_a)  
.option("header", "true")    
.option("sep", ",")    
.csv(path_a) 

val query = event1.writeStream
  .outputMode("append")
  .format("console")
  .start()
   spark.streams.awaitAnyTermination()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM