简体   繁体   English

使用 Spark Scala 在 MongoDB 中保存流式 dataframe

[英]Save Streaming dataframe in MongoDB using Spark Scala

I'm using Kakfa and Spark, my ouput(df1) is a streaming Dataframe I would like to save it into MongoDB.我正在使用 Kakfa 和 Spark,我的输出(df1)是流式 Dataframe 我想将其保存到 MongoDB 中。 Any suggestions?有什么建议么? Many thanks!非常感谢!

  val df= lines.selectExpr("CAST(value AS STRING)").as[(String)]

  .select(from_json($"value", DFschema).as("data"))
  .select("data.*")
  .writeStream
  .format("console")
  .option("truncate", "false")
  .start()
 .awaitTermination()

  df1 = df.filter($"COLUMN".isin(listA: _*))

  // save df1 into MongoDB
   //MongoSpark.save()...

Here's some methods to interact with mongodb with spark下面是一些用火花与 mongodb 交互的方法

val mongodb_input_uri = "mongodb://" + interface + ":" + port + "/" + database + "." + collection
    val mongodb_output_uri = "mongodb://" + interface + ":" + port + "/" + database + "." + collection

    val sparkSession = org.apache.spark.sql.SparkSession.builder
      .master("local")
      .appName("MongoSparkConnectorIntro")
      .config("spark.mongodb.input.uri", mongodb_input_uri)
      .config("spark.mongodb.output.uri", mongodb_output_uri)
.getOrCreate()


def writeData(sparkSession: SparkSession, dataframe : DataFrame)= {
    dataframe.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()
  }

  def readData(sparkSession: SparkSession): DataFrame = {
    val data = sparkSession.read.format("com.mongodb.spark.sql.DefaultSource").load()

    data

}

Source: github_source_code来源: github_source_code

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM