I have this Dataset and I'd like a more flexible way of grouping and editing the grouped data. As an example I wanted to remove the second Random_Text ...
I have this Dataset and I'd like a more flexible way of grouping and editing the grouped data. As an example I wanted to remove the second Random_Text ...
I have a spark Dataset with known columns that therefor can be "casted" as a Dataset of a case class. e.g. case class Record(id: string, occurredAt: ...
I'm working with Apache Spark and I have the following code: Dataset<Row> tradesDataset = sparkSession .sql("select * from a_table") ...
Small question regarding an integration between Splunk and Apache Spark. Currently, I am doing a search query in Splunk. The result is quite big. And ...
I am working with the OpenFoodFacts dataset using PySpark. There's quite a lot of columns which are entirely made up of missing values and I want to d ...
How does wide transformations actually work based on shuffle partitions configuration? If I have following program: Does it mean sort would output ...
I am trying to get the scala and spark work with datasets and aggregation functions. Based on the mapFunctionToTheSchema (which returns multiple re ...
When we have to convert a Spark Dataframe to Dataset. We generally use case class. It means we are converting a Row of un-Type to Type. Example: Le ...
how can i only select 2nd and 5th column from a csv file (no column name in file) in java spark, code as below: ...
I have two big files email file attachment file For simplicity say NOTE: Broadcast variable join has already performed with email file wit ...
My Input looks like below: Required output: My error_codes column in the output dataset is a Seq of strings. I need to make an array, can change S ...
in a spark session val spark = SparkSession .builder() .appName("Spark SQL basic example") .config("spark.some.config.option", "some-value") ...
I have a dataset, which has the following schema: I want to access the datatype of each StructField. E.g. if the data type of col_name_1 is NullTyp ...
I want to select the median value of one of a dataset column (the median being the value located at the middle of a set of values ranked in an ascendi ...
I'm facing a problem in spark where 2 skewed datasets takes too long to join. One(or two) of the datasets has skewed data in it and it's used as the j ...
I want to write a dataset/dataframe to a csv after performing several transformations(union) to the original dataset/dataframe. The dataset/dataframe ...
I want to parse the address column from the given table structure using addressParser function to get number, street, city and country. Sample Input: ...
I have a use case where I need to read a json file or json string using spark as Dataset[T] in scala. The json file has nested elements and some of th ...
I have one imputation method to do mean, median and mode operation but this getting failed if column data type is not in Double/Float. My java code: ...
I have been searching for any links or documents or articles that will help me understand when do we go for Datasets over Dataframes and vice-versa? ...