简体   繁体   English

Scala Spark Map DataFrame缺少参数类型

[英]Scala Spark Map DataFrame Missing Paramenter Type

I am new to Spark and receiving an error in when I map a Dataframe. 我是Spark的新手,在映射数据框时收到错误消息。

I have a DStream and I want to transform it using a sql Dataframe to filter the data. 我有一个DStream,我想使用sql Dataframe对其进行转换以过滤数据。 The code is like this: 代码是这样的:

  val textDStream = ssc.textFileStream(inputPath)
  val activityStream = textDStream.transform(input => {
    input.flatMap { line =>
      val record = line.split("\\t")
      Some(Activity(record(0).toLong / MS_IN_HOUR * MS_IN_HOUR, record(1), record(2), record(3), record(4), record(5), record(6)))
      }
  })

activityStream.transform(rdd => {

    val df = rdd.toDF()

    df.registerTempTable("activity")
    val activityByProduct = sqlContext.sql("""SELECT
                                        product,
                                        timestamp_hour,
                                        sum(case when action = 'purchase' then 1 else 0 end) as purchase_count,
                                        sum(case when action = 'add_to_cart' then 1 else 0 end) as add_to_cart_count,
                                        sum(case when action = 'page_view' then 1 else 0 end) as page_view_count
                                        from activity
                                        group by product, timestamp_hour """)

    activityByProduct
      .map { r => ((r.getString(0), r.getLong(1)),
        ActivityByProduct(r.getString(0), r.getLong(1), r.getLong(2), r.getLong(3), r.getLong(4))
        )}

  }).print()

The problem here is that I receive the following error: 这里的问题是我收到以下错误:

Error:(58, 18) missing parameter type .map { r => ((r.getString(0), r.getLong(1)), 错误:(58,18)缺少参数类型.map {r =>((r.getString(0),r.getLong(1)),

activityByProduct
  .map { r => ((r.getString(0), r.getLong(1)),
    ActivityByProduct(r.getString(0), r.getLong(1), r.getLong(2), r.getLong(3), r.getLong(4))
    )}

I cannot seen where the type is missing. 我看不到缺少的类型。 I have already tried to explicitilly set r => type. 我已经尝试过显式设置r =>类型。 But it continues to return the error. 但是它继续返回错误。

What could it be ? 会是什么呢 ?

Thanks in advance 提前致谢

It worked. 有效。

I had to convert the dataframe to rdd before executing the map: 在执行映射之前,我必须将数据帧转换为rdd:

activityByProduct.rdd
      .map { r =>
        ((r.getString(0), r.getLong(1)),
        ActivityByProduct(r.getString(0), r.getLong(1), r.getLong(2), r.getLong(3), r.getLong(4))
        )}

look at the .rdd after the activityByProduct 看看activityByProduct之后的.rdd

Yes, this works. 是的,这可行。 You need to to convert this to rdd if it has to work. 如果需要运行,则需要将其转换为rdd。 It works well in previous Spark versions but with 2.12 and above you will need this. 它在早期的Spark版本中运行良好,但在2.12及更高版本中,您将需要此功能。

Hello I also encountered the same problem with you. 您好我也遇到了同样的问题。 When I added the import hiveCtx.implicits._ code to the next line of my code val hiveCtx = new HiveContext (sc) the error was removed.Because this code converts the RDD implicitly to a DataFrame. 当我将import hiveCtx.implicits._代码添加到代码的下一行val hiveCtx = new HiveContext (sc)该错误已消除。因为此代码将RDD隐式转换为DataFrame。 Hope can help you. 希望可以帮到您。

The complete code is posted below hoping to help you. 以下是完整的代码,希望对您有所帮助。

package spark.sparkSQL

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext

object sparksql2 {
  def main(args: Array[String]): Unit = {
    val conf = new SparkConf().setAppName("sparksql").setMaster("local")
    val sc = new SparkContext(conf)
    sc.setLogLevel("ERROR")

    val hiveCtx = new HiveContext(sc)
    import hiveCtx.implicits._        //  ImportType(hiveCtx.implicits)

    val input = hiveCtx.jsonFile("./inputFile")
    // Register the input schema RDD
    input.registerTempTable("tweets")
    hiveCtx.cacheTable("tweets")
    // Select tweets based on the retweetCount
    val topTweets = hiveCtx.sql("SELECT text, retweetCount FROM tweets ORDER BY retweetCount LIMIT 10")
    topTweets.collect().map(println(_))
    val topTweetText = topTweets.map(row => row.getString(0))
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM