简体   繁体   English

Spark错误:无法找到存储在数据集中的类型的编码器

[英]Spark Error: Unable to find encoder for type stored in a Dataset

I am using Spark on a Zeppelin notebook, and groupByKey() does not seem to be working. 我在Zeppelin笔记本上使用Spark,而groupByKey()似乎不起作用。

This code: 这段代码:

df.groupByKey(row => row.getLong(0))
  .mapGroups((key, iterable) => println(key))

Gives me this error (presumably a compilation error, since it shows up in no time while the dataset I am working on is pretty big): 给我这个错误(可能是一个编译错误,因为它在我正在处理的数据集很大的时候很快出现):

error: Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.

I tried to add a case class and map all of my rows into it, but still got the same error 我尝试添加一个case类并将所有行映射到它中,但仍然遇到了同样的错误

import spark.implicits._

case class DFRow(profileId: Long, jobId: String, state: String)

def getDFRow(row: Row):DFRow = {
    return DFRow(row.getLong(row.fieldIndex("item0")),
                 row.getString(row.fieldIndex("item1")), 
                 row.getString(row.fieldIndex("item2")))
}

df.map(DFRow(_))
  .groupByKey(row => row.getLong(0))
  .mapGroups((key, iterable) => println(key))

The schema of my Dataframe is: 我的Dataframe的架构是:

root
|-- item0: long (nullable = true)
|-- item1: string (nullable = true)
|-- item2: string (nullable = true)

You're trying to mapGroups with a function (Long, Iterator[Row]) => Unit and there is no Encoder for Unit (not that it would make sense to have one). 您正在尝试使用函数(Long, Iterator[Row]) => UnitmapGroups (Long, Iterator[Row]) => Unit并且没有Unit Encoder (不是说它有意义)。

In general parts of the Dataset API which are not focused on the SQL DSL ( DataFrame => DataFrame , DataFrame => RelationalGroupedDataset , RelationalGroupedDataset => DataFrame , RelationalGroupedDataset => RelationalGroupedDataset ) require either implicit or explicit encoders for the output values. 通常, Dataset API中没有关注SQL DSL的部分( DataFrame => DataFrameDataFrame => RelationalGroupedDatasetRelationalGroupedDataset => DataFrameRelationalGroupedDataset => RelationalGroupedDataset )需要输出值的隐式或显式编码器。

Since there are no predefined encoders for Row objects, using Dataset[Row] with methods design for statically typed data doesn't make much sense. 由于Row对象没有预定义的编码器,因此使用Dataset[Row]和静态类型数据的方法设计没有多大意义。 As a rule of thumb you should always convert to the statically typed variant first: 根据经验,您应该首先转换为静态类型的变体:

df.as[(Long, String, String)]

See also Encoder error while trying to map dataframe row to updated row 在尝试将数据帧行映射到更新行时,请参阅编码器错误

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法找到存储在DataSet中的Decimal类型的编码器 - Unable to find encoder for Decimal type stored in a DataSet 即使导入了spark.implicits._,也“找不到用于存储在数据集中的类型的编码器”? - “Unable to find encoder for type stored in a Dataset” even spark.implicits._ is imported? “无法找到存储在数据集中的类型的编码器”和“没有足够的方法映射参数”? - “Unable to find encoder for type stored in a Dataset” and “not enough arguments for method map”? 错误:无法找到 org.apache.spark.sql.Dataset [(String, Long)] 类型的编码器 - Error: Unable to find encoder for type org.apache.spark.sql.Dataset[(String, Long)] 使用案例类编码JSON时,为什么错误“无法找到存储在数据集中的类型的编码器”? - Why is the error “Unable to find encoder for type stored in a Dataset” when encoding JSON using case classes? 为什么在创建自定义案例类的数据集时“无法找到存储在数据集中的类型的编码器”? - Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class? Spark:找不到单元类型的编码器 - Spark: Unable to find encoder for type Unit 无法找到用于存储在数据集中的类型的编码器,以通过Kafka流式处理mongo db数据 - Unable to find encoder for type stored in a Dataset for streaming mongo db data through Kafka 为什么从卡夫卡读取流失败并显示“无法找到存储在数据集中的类型的编码器”? - Why does reading stream from Kafka fail with “Unable to find encoder for type stored in a Dataset”? 找不到 AccessLog 类型的编码器。 需要隐式 Encoder[AccessLog] 将 AccessLog 实例存储在数据集中 - Unable to find encoder for type AccessLog. An implicit Encoder[AccessLog] is needed to store AccessLog instances in a Dataset
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM