Tag[apache-spark-encoders] Recent Newest Questions

Is there an Encoder for Map type in Java Spark?

I am trying to create a custom Aggregator function producing a Map as the result, however it requires an Encoders. As referenced in https://spark.apac ...

spark sql encoder for immutable data type

I've generally used immutable value types when writing java code. Sometimes it's been through libraries (Immutables, AutoValue, Lombok), but mostly ju ...

Union generic type without Either Scala

This works fine: This is also fine: However, how do we achieve this to return either type A or B? Is it simply possible to have a union ...

Impossible to operate on custom type after it is encoded? Spark Dataset

Say you have this (solution of encoding custom type is brought from this thread): When do a ds.show, I got: I understand that it's because the c ...

Spark Dataframe - Encoder

I am new to Scala and Spark. I am trying to use encoder to read a file from Spark and then convert to a java/scala object. The first step to read th ...

How to covert a Dataframe to a Dataset,having a object reference of the parent class as a composition inside another class?

I am trying to convert a Dataframe to a Dataset, and the java classes structure is as follows: class A: public class A { private int a; pu ...

Error: Unable to find encoder for type org.apache.spark.sql.Dataset[(String, Long)]

Following test for Dataset comparison is failing with the error: Test As you can see, I tried creating the Kryo Encoder for (String, Long) Spar ...

Value Type is binary after Spark Dataset mapGroups operation even return a String in the function

Environment: The spark application trys to do the following 1) Convert input data into a Dataset[GenericRecord] 2) Group by the key propery of th ...

Add ADT column in Spark dataset?

I want to create a dataset which contains an ADT column. Based on this question: Encode an ADT / sealed trait hierarchy into Spark DataSet column I kn ...

Parsing Protobuf ByteString in Spark not working after creating Encoder

I'm trying to parse protobuf (protobuf3) data in spark 2.4 and I'm having some trouble with the ByteString type. I've created the case class using the ...

Column type inferred as binary with typed UDAF

I'm trying to implement a typed UDAF that returns a complex type. Somehow Spark cannot infer the type of a result column and makes it binary putting t ...

Spark Scala Dataset Type Hierarchy

Trying to enforce classes that extend W to have a method get that returns a Dataset of a subclass of a WR. Compilation error: If I change the g ...

Rename columns in spark using @JsonProperty while creating Datasets

Is there way to rename the column names in dataset using Jackson annotations while creating a Dataset? My encoder class is as follows: My aim is t ...

Question regarding kryo and java encoders in datasets

I am using Spark 2.4 and referring to https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence Bean class: public class Emplo ...

Why doesn't dataset's foreach method require an encoder, but map does?

I have two datasets: Dataset[User] and Dataset[Book] where both User and Book are case classes. I join them like this: val joinDS = ds1.join(ds2, "us ...

How to create an Encoder for Scala collection (to implement custom Aggregator)?

Spark 2.3.0 with Scala 2.11. I'm implementing a custom Aggregator according to the docs here. The aggregator requires 3 types for input, buffer, and o ...

How to make an Encoder for scala Iterable, spark dataset

I'm trying to create a Dataset from a RDD y Pattern: y: RDD[(MyObj1, scala.Iterable[MyObj2])] So I created explicitly encoder : When I compile t ...

How to implement Functor[Dataset]

I am struggling on how to create an instance of Functor[Dataset]... the problem is that when you map from A to B the Encoder[B] must be in the implici ...

How to create a Dataset of Maps?

I'm using Spark 2.2 and am running into troubles when attempting to call spark.createDataset on a Seq of Map. Code and output from my Spark Shell ses ...

Generic T as Spark Dataset[T] constructor

In the following snippet, the tryParquet function tries to load a Dataset from a Parquet file if it exists. If not, it computes, persists and returns ...