I'm trying to modify the partition of existing delta table. I know how to do that using Data Frame API. I need to achieve similar thing using SPRAK SQ ...
I'm trying to modify the partition of existing delta table. I know how to do that using Data Frame API. I need to achieve similar thing using SPRAK SQ ...
i have a java application. In the java application I have spark context. From spark context I create multiple spark session by doing sparkSession.newS ...
Input: below query worked in sql server, due to correlated subquery same is not working in spark sql. Is there any alternate either with spark sql ...
Requirements: I wanted to create a dataframe out of one column (existing dataframe ). That column value is multiple json list. Problem: Since the j ...
. Answers to this question are eligible for a +100 reputation bounty. F ...
I am seeing something called as DataFilter in my query execution plan: There is a PartitionFilters: [] PushedFilters: [IsNotNull(product_id)] ...
Question: When joining two datasets, Why is the filter isnotnull applied twice on the joining key column? In the physical plan, it is once applied as ...
In python I am trying to create and write to the table TBL in the database DB in Databricks. But I get an exception: A schema mismatch detected when w ...
I am trying to execute the below code since I need to lookup the table and create a new column out of it. So, I am trying to go with udf as joining di ...
I have a huge dataframe similar to this: It has rows that match to the header and I would want to drop all of them, so that the result would be: ...
I am trying to join tow streaming data in Spark structured streaming. Data structures are as follows: Table: CardHolder CardNo ...
My CVS is this - I'm printing my schema in the logs - (you see, the columns are now flipped, or sorted - Ah!) I'm getting below error This ...
I am learning Scala, and am trying to filter a select few columns from a large nested json file to make into a DataFrame. This is the gist of the json ...
I have following example which i am running on Spark 3.3 The output is as expected, i am getting correct min/max value for each window When i ad ...
I have a pyspark dataframe with columns "A", "B",and "C","D". I want to add a column with mean of rows. But the condition is that the column names for ...
I want to do addition of similar type of columns (total columns are more than 100) as follows: id b c d b_apac ...
List item Calculate total number of matches played by team ,When it is present in both HomeTeam and Away Team col using pandas/pyspark I though ...
I wanted to collect sample based on age with a condition on the Failure status. I am interested in 3 days old serial number. However, I don't need hea ...
Im trying to do something to the effect of this: Name Status Bill Cancelled on 01/01/2023 ...
I have two dataframes I want to add only specific language rows from sdf2 to the first dataframe. I do it with a loop: But it only appends rows ...