I am trying to create a custom Aggregator function producing a Map as the result, however it requires an Encoders. As referenced in https://spark.apac ...
I am trying to create a custom Aggregator function producing a Map as the result, however it requires an Encoders. As referenced in https://spark.apac ...
I've generally used immutable value types when writing java code. Sometimes it's been through libraries (Immutables, AutoValue, Lombok), but mostly ju ...
This works fine: This is also fine: However, how do we achieve this to return either type A or B? Is it simply possible to have a union ...
Say you have this (solution of encoding custom type is brought from this thread): When do a ds.show, I got: I understand that it's because the c ...
I am new to Scala and Spark. I am trying to use encoder to read a file from Spark and then convert to a java/scala object. The first step to read th ...
I am trying to convert a Dataframe to a Dataset, and the java classes structure is as follows: class A: public class A { private int a; pu ...
Following test for Dataset comparison is failing with the error: Test As you can see, I tried creating the Kryo Encoder for (String, Long) Spar ...
Environment: The spark application trys to do the following 1) Convert input data into a Dataset[GenericRecord] 2) Group by the key propery of th ...
I want to create a dataset which contains an ADT column. Based on this question: Encode an ADT / sealed trait hierarchy into Spark DataSet column I kn ...
I'm trying to parse protobuf (protobuf3) data in spark 2.4 and I'm having some trouble with the ByteString type. I've created the case class using the ...
I'm trying to implement a typed UDAF that returns a complex type. Somehow Spark cannot infer the type of a result column and makes it binary putting t ...
Trying to enforce classes that extend W to have a method get that returns a Dataset of a subclass of a WR. Compilation error: If I change the g ...
Is there way to rename the column names in dataset using Jackson annotations while creating a Dataset? My encoder class is as follows: My aim is t ...
I am using Spark 2.4 and referring to https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence Bean class: public class Emplo ...
I have two datasets: Dataset[User] and Dataset[Book] where both User and Book are case classes. I join them like this: val joinDS = ds1.join(ds2, "us ...
Spark 2.3.0 with Scala 2.11. I'm implementing a custom Aggregator according to the docs here. The aggregator requires 3 types for input, buffer, and o ...
I'm trying to create a Dataset from a RDD y Pattern: y: RDD[(MyObj1, scala.Iterable[MyObj2])] So I created explicitly encoder : When I compile t ...
I am struggling on how to create an instance of Functor[Dataset]... the problem is that when you map from A to B the Encoder[B] must be in the implici ...
I'm using Spark 2.2 and am running into troubles when attempting to call spark.createDataset on a Seq of Map. Code and output from my Spark Shell ses ...
In the following snippet, the tryParquet function tries to load a Dataset from a Parquet file if it exists. If not, it computes, persists and returns ...