繁体   English   中英

有没有办法修改此代码以让火花流从 json 读取?

[英]Is there a way to modify this code to let spark streaming read from json?

我正在开发一个 Spark 流应用程序/代码,它不断从 localhost 9098 读取数据。有没有办法将 localhost 修改为 <users/folder/path> 以便自动从文件夹路径或 json 读取数据?

import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.log4j.Logger
import org.apache.log4j.Level

object StreamingApplication extends App {

  Logger.getLogger("Org").setLevel(Level.ERROR)

  //creating spark streaming context
  val sc = new SparkContext("local[*]", "wordCount")
  val ssc = new StreamingContext(sc, Seconds(5))

  // lines is a Dstream
  val lines = ssc.socketTextStream("localhost", 9098)

  // words is a transformed Dstream
  val words = lines.flatMap(x => x.split(" "))

  // bunch of transformations
  val pairs = words.map(x=> (x,1))
  val wordsCount = pairs.reduceByKey((x,y) => x+y)

  // print is an action
  wordsCount.print()

  // start the streaming context
  ssc.start()

ssc.awaitTermination()


}

基本上,我需要帮助来修改下面的代码:

val lines = ssc.socketTextStream("localhost", 9098)

对此:

val lines = ssc.socketTextStream("<folder path>")

仅供参考,我正在使用 IntelliJ Idea 来构建它。

我建议阅读 Spark 文档,尤其是 scaladoc。

似乎存在一种方法fileStream

https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/streaming/StreamingContext.html

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM