简体   繁体   English

Spark-shell错误对象映射不是包org.apache.spark.streaming.rdd的成员

[英]Spark-shell Error object map is not a member of package org.apache.spark.streaming.rdd

I am trying to read json and parseout two values valueStr1 and valueStr2 from a Kafka topic KafkaStreamTestTopic1 using spark streaming. 我正在尝试使用valueStr1 valueStr2从Kafka主题KafkaStreamTestTopic1读取json并解析出两个值valueStr1valueStr2 And convert it to a data frame for further processing. 并将其转换为数据帧以进行进一步处理。

I am running the code in a spark-shell so spark context sc is available. 我在spark-shell中运行代码,因此可以使用spark context sc

But when I run this script, it is giving me the following error: 但是,当我运行此脚本时,它给了我以下错误:

error: object map is not a member of package org.apache.spark.streaming.rdd val dfa = rdd.map(record => { 错误:对象映射不是org.apache.spark.streaming.rdd包的成员val dfa = rdd.map(record => {

Below is the script used: 以下是使用的脚本:

import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.spark.{SparkConf, TaskContext}
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka010._
import org.apache.kafka.common.serialization.StringDeserializer
import play.api.libs.json._
import org.apache.spark.sql._

val ssc = new StreamingContext(sc, Seconds(5))

val sparkSession = SparkSession.builder().appName("myApp").getOrCreate()
val sqlContext = new SQLContext(sc)

// Create direct kafka stream with brokers and topics
val topicsSet = Array("KafkaStreamTestTopic1").toSet

// Set kafka Parameters
val kafkaParams = Map[String, String](
  "bootstrap.servers" -> "localhost:9092",
  "key.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
  "value.deserializer" -> "org.apache.kafka.common.serialization.StringDeserializer",
  "group.id" -> "my_group",
  "auto.offset.reset" -> "earliest",
  "enable.auto.commit" -> "false"
)

val stream = KafkaUtils.createDirectStream[String, String](
  ssc,
  LocationStrategies.PreferConsistent,
  ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams)
)

val lines = stream.map(_.value)

lines.print()

case class MyObj(val one: JsValue)

lines.foreachRDD(rdd => {
  println("Debug Entered")

  import sparkSession.implicits._
  import sqlContext.implicits._


  val dfa = rdd.map(record => {

    implicit val myObjEncoder = org.apache.spark.sql.Encoders.kryo[MyObj]

    val json: JsValue = Json.parse(record)
    val value1 = (json \ "root" \ "child1" \ "child2" \ "valueStr1").getOrElse(null)
    val value2 = (json \ "root" \ "child1" \ "child2" \ "valueStr2").getOrElse(null)

    (new MyObj(value1), new MyObj(value2))

  }).toDF()

  dfa.show()
  println("Dfa Size is: " + dfa.count())


})

ssc.start()

I suppose the problem is that rdd is also a package ( org.apache.spark.streaming.rdd ) that you imported automatically with the line: 我想问题是rdd也是您使用以下行自动导入的包( org.apache.spark.streaming.rdd ):

import org.apache.spark.streaming._

To avoid those kind of clashes, rename your variable to something else, for example myRdd : 为了避免这种冲突,请将变量重命名为其他名称,例如myRdd

lines.foreachRDD(myRdd => { /* ... */ })

Add the dependency of spark-streaming into your build manager 将火花流的依赖性添加到构建管理器中

     "org.apache.spark" %% "spark-mllib" % SparkVersion,
    "org.apache.spark" %% "spark-streaming-kafka-0-10" % 
     "2.0.1"

You can use maven or SBT to add during build. 您可以在构建期间使用maven或SBT添加。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark-Shell —错误:对象jblas不是包org的成员(Windows) - Spark-Shell— error: object jblas is not a member of package org (Windows) 对象映射不是包org.apache.spark.rdd的成员 - Object map is not a member of package org.apache.spark.rdd 错误:对象StreamingContext不是包的成员org.apache.spark.streaming import org.apache.spark.streaming.StreamingContext - error: object StreamingContext is not a member of package org.apache.spark.streaming import org.apache.spark.streaming.StreamingContext 对象流不是包 org.apache.spark 的成员 - Object streaming is not a member of package org.apache.spark Spark从Kafka串流,然后写入并给出Error:value to不是org.apache.spark.rdd.RDD [(String,Int)]的成员 - Spark Streaming From Kafka and Write to giving Error : value to is not a member of org.apache.spark.rdd.RDD[(String, Int)] 错误:对象DataFrame不是包org.apache.spark.sql的成员 - error: object DataFrame is not a member of package org.apache.spark.sql Maven 构建错误(Scala + Spark):object apache 不是 ZEFE90A8E604A7C8DBB7D88 的成员 - Maven build ERROR (Scala + Spark):object apache is not a member of package org sbt 错误:object spark 不是 package org.ZB6EFD606D118D0F62066E31ZCC19 的成员 - sbt error: object spark is not a member of package org.apache 在 Intellij 中运行 Spark 时出错:“object apache is not a member of package org” - Error in running Spark in Intellij : "object apache is not a member of package org" 对象StreamingContext不是包org.apache.spark的成员[错误]导入org.apache.spark.StreamingContext - object StreamingContext is not a member of package org.apache.spark [error] import org.apache.spark.StreamingContext
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM