简体   繁体   English

找不到 AccessLog 类型的编码器。 需要隐式 Encoder[AccessLog] 将 AccessLog 实例存储在数据集中

[英]Unable to find encoder for type AccessLog. An implicit Encoder[AccessLog] is needed to store AccessLog instances in a Dataset

Hello I am working on a problem with scala/spark project trying to do some computation my scala code works well on spark-shell but when try to run the same code with sbt-assembly to convert scala to.jar file, I face this error:您好,我正在解决 scala/spark 项目的问题,尝试进行一些计算,我的 scala 代码在 spark-shell 上运行良好,但是当尝试使用 sbt-assembly 运行相同的代码以将 scala 转换为.Z6899Z5FCBF48492DAC404 文件时出现此错误,032492DAC404A 错误:

Unable to find encoder for type AccessLog.找不到 AccessLog 类型的编码器。 An implicit Encoder[AccessLog] is needed to store AccessLog instances in a Dataset.需要隐式 Encoder[AccessLog] 将 AccessLog 实例存储在数据集中。 Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.通过导入 spark.implicits 支持原始类型(Int、String 等)和产品类型(案例类)。未来版本中将添加对序列化其他类型的支持。

I am trying to convert Dataset[List[String]] to be Dataset[AccessLog] AccessLog is a case class, by mapping it using.我正在尝试通过使用映射将 Dataset[List[String]] 转换为 Dataset[AccessLog] AccessLog 是一种情况 class。

Error screenshot错误截图

The code that generates the error:产生错误的代码:

import org.apache.spark.sql.{ Dataset, Encoder, SparkSession }
import org.apache.spark.sql.functions._

object DstiJob {

  // try and catch
  def run(spark: SparkSession, inputPath: String, outputPath: String): String = {
    // import spark.sqlContext.implicits._
    import spark.implicits._
    import org.apache.spark.sql.{ Encoder, Encoders }
    // implicit val enc: Encoder[AccessLog] = Encoders.product[AccessLog]

    val inputPath = "access.log.gz"
    val outputPath = "data/reports"
    val logsAsString = spark.read.text(inputPath).as[String]

    case class AccessLog(ip: String, ident: String, user: String, datetime: String, request: String, status: String, size: String, referer: String, userAgent: String, unk: String)

    val R = """^(?<ip>[0-9.]+) (?<identd>[^ ]) (?<user>[^ ]) \[(?<datetime>[^\]]+)\] \"(?<request>[^\"]*)\" (?<status>[^ ]*) (?<size>[^ ]*) \"(?<referer>[^\"]*)\" \"(?<useragent>[^\"]*)\" \"(?<unk>[^\"]*)\""".r
    val dsParsed = logsAsString.flatMap(x => R.unapplySeq(x))
    def toAccessLog(params: List[String]) = AccessLog(params(0), params(1), params(2), params(3), params(5), params(5), params(6), params(7), params(8), params(9))

    val ds: Dataset[AccessLog] = dsParsed.map(toAccessLog _)
    val dsWithTime = ds.withColumn("datetime", to_timestamp(ds("datetime"), "dd/MMM/yyyy:HH:mm:ss X"))
    dsWithTime.cache
    dsWithTime.createOrReplaceTempView("AccessLog")

To solve the compilation error, the case class should be defined outside of the method run .为了解决编译错误,应该在方法run之外定义案例 class 。

Instead of代替

object DstiJob {

    def run(spark: SparkSession, ...) {
       [...]
       case class AccessLog(...)
       val ds: Dataset[AccessLog] = ...
       [...]
    }
}

you can use您可以使用

object DstiJob {

   case class AccessLog(...)

   def run(spark: SparkSession, ...) {
       [...]  
       val ds: Dataset[AccessLog] = ...
       [...]
   }
}

This should solve the issue, but unfortunately I cannot explain why this helps.这应该可以解决问题,但不幸的是我无法解释为什么这会有所帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法找到存储在DataSet中的Decimal类型的编码器 - Unable to find encoder for Decimal type stored in a DataSet Spark错误:无法找到存储在数据集中的类型的编码器 - Spark Error: Unable to find encoder for type stored in a Dataset “无法找到存储在数据集中的类型的编码器”和“没有足够的方法映射参数”? - “Unable to find encoder for type stored in a Dataset” and “not enough arguments for method map”? Spark:找不到单元类型的编码器 - Spark: Unable to find encoder for type Unit Circe无法找到隐式编码器 - Circe cannot find implicit encoder 特征类型参数的隐式编码器 - Implicit encoder for a trait type parameter 为什么在创建自定义案例类的数据集时“无法找到存储在数据集中的类型的编码器”? - Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class? 无法找到用于存储在数据集中的类型的编码器,以通过Kafka流式处理mongo db数据 - Unable to find encoder for type stored in a Dataset for streaming mongo db data through Kafka 使用案例类编码JSON时,为什么错误“无法找到存储在数据集中的类型的编码器”? - Why is the error “Unable to find encoder for type stored in a Dataset” when encoding JSON using case classes? 即使导入了spark.implicits._,也“找不到用于存储在数据集中的类型的编码器”? - “Unable to find encoder for type stored in a Dataset” even spark.implicits._ is imported?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM