[英]Spark-shell Error object map is not a member of package org.apache.spark.streaming.rdd
I am trying to read json and parseout two values valueStr1
and valueStr2
from a Kafka topic KafkaStreamTestTopic1
using spark streaming. 我正在尝试使用
valueStr1
valueStr2
从Kafka主题KafkaStreamTestTopic1
读取json并解析出两个值valueStr1
和valueStr2
。 And convert it to a data frame for further processing. 并将其转换为数据帧以进行进一步处理。
I am running the code in a spark-shell so spark context sc
is available. 我在spark-shell中运行代码,因此可以使用spark context
sc
。
But when I run this script, it is giving me the following error: 但是,当我运行此脚本时,它给了我以下错误:
error: object map is not a member of package org.apache.spark.streaming.rdd val dfa = rdd.map(record => {
错误:对象映射不是org.apache.spark.streaming.rdd包的成员val dfa = rdd.map(record => {
Below is the script used: 以下是使用的脚本:
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.spark.{SparkConf, TaskContext}
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka010._
import org.apache.kafka.common.serialization.StringDeserializer
import play.api.libs.json._
import org.apache.spark.sql._
val ssc = new StreamingContext(sc, Seconds(5))
val sparkSession = SparkSession.builder().appName("myApp").getOrCreate()
val sqlContext = new SQLContext(sc)
// Create direct kafka stream with brokers and topics
val topicsSet = Array("KafkaStreamTestTopic1").toSet
// Set kafka Parameters
val kafkaParams = Map[String, String](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
"value.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
"group.id" -> "my_group",
"auto.offset.reset" -> "earliest",
"enable.auto.commit" -> "false"
)
val stream = KafkaUtils.createDirectStream[String, String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams)
)
val lines = stream.map(_.value)
lines.print()
case class MyObj(val one: JsValue)
lines.foreachRDD(rdd => {
println("Debug Entered")
import sparkSession.implicits._
import sqlContext.implicits._
val dfa = rdd.map(record => {
implicit val myObjEncoder = org.apache.spark.sql.Encoders.kryo[MyObj]
val json: JsValue = Json.parse(record)
val value1 = (json \ "root" \ "child1" \ "child2" \ "valueStr1").getOrElse(null)
val value2 = (json \ "root" \ "child1" \ "child2" \ "valueStr2").getOrElse(null)
(new MyObj(value1), new MyObj(value2))
}).toDF()
dfa.show()
println("Dfa Size is: " + dfa.count())
})
ssc.start()
I suppose the problem is that rdd
is also a package ( org.apache.spark.streaming.rdd
) that you imported automatically with the line: 我想问题是
rdd
也是您使用以下行自动导入的包( org.apache.spark.streaming.rdd
):
import org.apache.spark.streaming._
To avoid those kind of clashes, rename your variable to something else, for example myRdd
: 为了避免这种冲突,请将变量重命名为其他名称,例如
myRdd
:
lines.foreachRDD(myRdd => { /* ... */ })
Add the dependency of spark-streaming into your build manager 将火花流的依赖性添加到构建管理器中
"org.apache.spark" %% "spark-mllib" % SparkVersion,
"org.apache.spark" %% "spark-streaming-kafka-0-10" %
"2.0.1"
You can use maven or SBT to add during build. 您可以在构建期间使用maven或SBT添加。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.