[英]Is there a way to modify this code to let spark streaming read from json?
我正在开发一个 Spark 流应用程序/代码,它不断从 localhost 9098 读取数据。有没有办法将 localhost 修改为 <users/folder/path> 以便自动从文件夹路径或 json 读取数据?
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.log4j.Logger
import org.apache.log4j.Level
object StreamingApplication extends App {
Logger.getLogger("Org").setLevel(Level.ERROR)
//creating spark streaming context
val sc = new SparkContext("local[*]", "wordCount")
val ssc = new StreamingContext(sc, Seconds(5))
// lines is a Dstream
val lines = ssc.socketTextStream("localhost", 9098)
// words is a transformed Dstream
val words = lines.flatMap(x => x.split(" "))
// bunch of transformations
val pairs = words.map(x=> (x,1))
val wordsCount = pairs.reduceByKey((x,y) => x+y)
// print is an action
wordsCount.print()
// start the streaming context
ssc.start()
ssc.awaitTermination()
}
基本上,我需要帮助来修改下面的代码:
val lines = ssc.socketTextStream("localhost", 9098)
对此:
val lines = ssc.socketTextStream("<folder path>")
仅供参考,我正在使用 IntelliJ Idea 来构建它。
我建议阅读 Spark 文档,尤其是 scaladoc。
似乎存在一种方法fileStream
。
https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/streaming/StreamingContext.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.