I can make a Spark DataFrame with a vector column with the toDF method. I'm not sure how to create a vector column with the createDataFrame method. ...
I can make a Spark DataFrame with a vector column with the toDF method. I'm not sure how to create a vector column with the createDataFrame method. ...
I have read other related questions but I do not find the answer. I want to create a DataFrame from a case class in Spark 2.3. Scala 2.11.8. Code ...
Imagine a csv as follow : I want to obtain automatically a DF with 4 columns a,b,c,d. A manual technique can be : The problem with this techni ...
I have a Pandas dataframe with one column containing string IDs. I am using idxmax() to return the index of the found IDs but since the data is over a ...
I am trying to use scala TypeClass on Spark Types, here is a small code snippet I wrote. When I run this on my local intellij, following error is ...
I am trying to achieve 80% trimmed mean for every group in scala to get rid of the outliers. But this has to applied only if the number of records are ...
How can I achieve keys of a grouped spark-dataframe? And another question: What does a pyspark.sql.group.GroupedData object include? ...
Input Data: Code After reading the data into DF with columns key,data,value I am trying to order the column by column key and drop the same ...
I want to partition data using ID, and with in each partition I want to -apply a set of operations -take distinct Doing distinct within each parti ...
I had a problem, which is a for loop program.like below: but the "new_df_name" is just a Variable and String type. how to realize these? ...
I have a dataframe df1 with a column col1 that has structure : and another dataframe df2 with col1 that has structure: Inorder to union df1.unio ...
I'm trying to read Kafka topics through Apache Spark Streaming and am not able to figure out how to transform the data in DStream to DataFrame and the ...
I have a dataframe which I am writing to Hive table using partitionBy - If I create another dataframe and want to append the content of this data f ...
I get log4j format logs, process them and store them in Spark. I am not in clustered or multi node environment. Using Spark as a single node applicati ...
To give some background, I am trying to run TPCDS benchmark on Spark with and without Spark's catalyst optimizer. For complicated queries on smaller d ...
I have two data frame . Data Frame one Data Frame Two is Now i want to add all columns of data frame one two data frame except for the records ...
I'm doing some kind of aggregation on the dataframe I have created. Here are the steps HowEver when I do a printSchema on my newly created DataFra ...
I have a spark structured streaming application (listening to kafka) that is also reading from a persistent table in s3 I am trying to have each micro ...
How can I replace empty values in a column Field1 of DataFrame df? This command does not provide an expected result: The expected result: ...
I having trouble trying to filter rows in a column based on multiple conditions. Basically I'm storing my multiple conditions in an array and I want t ...