簡體   English   中英

使用Apache Spark過濾JSON(NO SPARK SQL)-Scala

[英]Filtering JSON with apache spark (NO SPARK SQL) - Scala

我正在嘗試將過濾器應用於來自Kafka Direct流的json數據流。 我正在使用net.liftweb lift-json_2.11解析示例JSON {"type": "fast", "k":%d} 這是我的代碼:

val stream = KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams))

val s1 = stream.map(record => parse(record.value))

s1.print()的結果是:

...
JObject(List(JField(type,JString(fast)), JField(k,JInt(11428))))
JObject(List(JField(type,JString(fast)), JField(k,JInt(11429))))
JObject(List(JField(type,JString(fast)), JField(k,JInt(11430))))
...

如何在k場上應用火花過濾器? 例如: k%2==0

我不想使用SparkSQL,因為我還需要對數據流應用Joins,而SparkSQL不允許我這樣做。 謝謝

解:

//spark import
import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream

//kafka import
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe

//json library import
import org.json4s._
import org.json4s.native.JsonMethods._
import org.json4s.native.Serialization
import org.json4s.native.Serialization.{read, write}

object App {

  def main(args : Array[String]) {

// Create the context with a 1 second batch size
val sparkConf = new SparkConf().setAppName("SparkScript").setMaster("local[4]")
val ssc = new StreamingContext(sparkConf, Seconds(5))

case class MySens(elem: String, k: Int, pl: String)

val kafkaParams = Map[String, Object](
    "bootstrap.servers" -> "localhost:9092",
    "key.deserializer" -> classOf[StringDeserializer].getCanonicalName,
    "value.deserializer" -> classOf[StringDeserializer].getCanonicalName,
    "group.id" -> "test_luca",
    "auto.offset.reset" -> "latest",
    "enable.auto.commit" -> (false: java.lang.Boolean)
)

val topics1 = Array("fast-messages")

val stream = KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics1, kafkaParams))

val s1 = stream.map(record => {
  implicit val formats = DefaultFormats
  parse(record.value).extract[MySens]
}
)

val p1 = s1.filter {e => e.k.%(10)==0}

p1.print()

ssc.start()
ssc.awaitTermination()
}

}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM