[英]Encoder[Row] in Scala Spark
I'm trying to perform a simple map on a Dataset[Row]
( DataFrame
) in Spark 2.0.0.我正在尝试在 Spark 2.0.0 中的
Dataset[Row]
( DataFrame
) 上执行一个简单的映射。 Something as simple as this像这样简单的事情
val df: DataSet[Row] = ...
df.map { r: Row => r }
But the compiler is complaining that I'm not providing the implicit Encoder[Row]
argument to the map function:但是编译器抱怨我没有向 map 函数提供隐式
Encoder[Row]
参数:
not enough arguments for method map: (implicit evidence$7: Encoder[Row]).
方法映射的参数不足:(隐式证据 $7:编码器 [行])。
Everything works fine if I convert to an RDD first ds.rdd.map { r: Row => r }
but shouldn't there be an easy way to get an Encoder[Row]
like there is for tuple types Encoders.product[(Int, Double)]
?如果我首先转换为 RDD,一切正常
ds.rdd.map { r: Row => r }
但不应该有一种简单的方法来获得Encoder[Row]
就像元组类型Encoders.product[(Int, Double)]
?
[Note that my Row
is dynamically sized in such a way that it can't easily be converted into a strongly-typed Dataset
.] [请注意,我的
Row
是动态调整大小的,因此无法轻松将其转换为强类型Dataset
。]
An Encoder
needs to know how to pack the elements inside the Row
. Encoder
需要知道如何打包Row
的元素。 So you could write your own Encoder[Row]
by using row.structType
which determines the elements of your Row
at runtime and uses the corresponding decoders.因此,您可以使用
row.structType
编写自己的Encoder[Row]
,它在运行时确定Row
的元素并使用相应的解码器。
Or if you know more about the data that goes into Row
, you could use https://github.com/adelbertc/frameless/或者,如果您对进入
Row
的数据有更多了解,可以使用https://github.com/adelbertc/frameless/
SSry to be a "bit" late. SSry“有点”晚了。 Hopefully this helps to someone who is hitting the problem right now.
希望这对现在遇到问题的人有所帮助。 Easiest way to define encoder is deriving the structure from existing DataFrame:
定义编码器的最简单方法是从现有 DataFrame 派生结构:
val df = Seq((1, "a"), (2, "b"), (3, "c").toDF("id", "name")
val myEncoder = RowEndocer(df.schema)
Such approach could be useful when you need altering existing fields from your original DataFrame.当您需要更改原始 DataFrame 中的现有字段时,这种方法可能很有用。
If you're dealing with completely new structure, explicit definition relying on StructType
and StructField
(as suggested in @Reactormonk 's little cryptic response).如果您正在处理全新的结构,则依赖
StructType
和StructField
显式定义(如@Reactormonk 的小神秘响应中所建议的那样)。
Example defining the same encoder:定义相同编码器的示例:
val myEncoder2 = RowEncoder(StructType(
Seq(StructField("id", IntegerType),
StructField("name", StringType)
)))
Please remember org.apache.spark.sql._
, org.apache.spark.sql.types._
and org.apache.spark.sql.catalyst.encoders.RowEncoder
libraries have to be imported.请记住
org.apache.spark.sql._
, org.apache.spark.sql.types._
和org.apache.spark.sql.catalyst.encoders.RowEncoder
库必须被导入。
在映射函数不更改架构的特定情况下,您可以传入 DataFrame 本身的编码器:
df.map(r => r)(df.encoder)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.