简体   繁体   中英

How to register custom Spark structured streaming source

I need to create a custom streaming source by extending FileStreamSource. The idea is to override commit, so that processed files (S3 objects in this case) are renamed to have a certain prefix. However, I don't know how to use this custom source. Obviously I don't want to compile Spark -- the application will be running on Amazon EMR clusters.

Once you create your custom source in your project, you need to register it in the format of a DataStreamReader :

val input = spark
  .readStream
  .format("path.to.MyCustomFileStreamSource")
  .load

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM