[英]scala.MatchError: [abc,cde,null,3] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema) in Spark JSON with missing fields
[英]Filtering JSON with apache spark (NO SPARK SQL) - Scala
我正在嘗試將過濾器應用於來自Kafka Direct流的json數據流。 我正在使用net.liftweb lift-json_2.11
解析示例JSON {"type": "fast", "k":%d}
。 這是我的代碼:
val stream = KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics, kafkaParams))
val s1 = stream.map(record => parse(record.value))
s1.print()
的結果是:
...
JObject(List(JField(type,JString(fast)), JField(k,JInt(11428))))
JObject(List(JField(type,JString(fast)), JField(k,JInt(11429))))
JObject(List(JField(type,JString(fast)), JField(k,JInt(11430))))
...
如何在k
場上應用火花過濾器? 例如: k%2==0
我不想使用SparkSQL,因為我還需要對數據流應用Joins,而SparkSQL不允許我這樣做。 謝謝
解:
//spark import
import org.apache.spark.SparkConf
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.Seconds
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream
//kafka import
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
//json library import
import org.json4s._
import org.json4s.native.JsonMethods._
import org.json4s.native.Serialization
import org.json4s.native.Serialization.{read, write}
object App {
def main(args : Array[String]) {
// Create the context with a 1 second batch size
val sparkConf = new SparkConf().setAppName("SparkScript").setMaster("local[4]")
val ssc = new StreamingContext(sparkConf, Seconds(5))
case class MySens(elem: String, k: Int, pl: String)
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer].getCanonicalName,
"value.deserializer" -> classOf[StringDeserializer].getCanonicalName,
"group.id" -> "test_luca",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics1 = Array("fast-messages")
val stream = KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topics1, kafkaParams))
val s1 = stream.map(record => {
implicit val formats = DefaultFormats
parse(record.value).extract[MySens]
}
)
val p1 = s1.filter {e => e.k.%(10)==0}
p1.print()
ssc.start()
ssc.awaitTermination()
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.