I'm trying to modify the partition of existing delta table. I know how to do that using Data Frame API. I need to achieve similar thing using SPRAK SQ ...
I'm trying to modify the partition of existing delta table. I know how to do that using Data Frame API. I need to achieve similar thing using SPRAK SQ ...
i have a java application. In the java application I have spark context. From spark context I create multiple spark session by doing sparkSession.newS ...
I try to configure Apache Spark PySpark in Visual Studio Code. I install the "Spark & Hive Tools" extension pack on VScode and add Python > ...
Issue: I run a spark job that uses up all the cores on all the nodes and yet in the Dataproc CPU monitoring graph the CPU usage touches a max of 12% ...
Please ensure dynamic allocation is not killing your containers while you monitor the YARN UI. See the answer below Issue: I can start the SparkSessi ...
I followed the instruction Use delta tables in Apache Spark but when I try to save the tables into lakehouse, I got below message. I got the similar ...
I have a dataframe like this, for the sake of simplicity i'm just showing 2 columns both columns are string, but in real life it will have more column ...
I am using Spark3.0.1 I have following data as csv: 348702330256514,37495066290,9084849,33946,614677375609919,11-02-2018 0:00:00,GENUINE 34870233025 ...
I have a Python script that runs statistical analysis and trained deep learning models on input data. The data size is fairly small (~5Mb) however the ...
I have the following data: I would like to transform it into a dataframe like the following: I tried the following: But im getting: ...
How can i match if two RDD generated the way i did contains the same data including number of rows? I'm using scala test to run the tests and spark ve ...
I am trying to initialize an Apache Spark instance on Windows 10 to run a local test. My problem is during the initialization of the Spark instance, I ...
Problem statement : I have a csv file with around 100+ fields.I need to perform transformation on these fields and generate new 80+ fields and write o ...
Getting the following error- This is the query i am running on spark 3.3, with glue catalog and saving to s3. The iceberg version is 1.1.0 - But ...
I want to use scala and spark to implement Graph algorithm GraphSAGE, then how to do it? Is there any source code? I want to get the code for my ques ...
I'm trying to run a simple intersect on a couple of tables with geometries and get this error. My script. This is table A. It has a few million ...
There is a CSV with a column ID (format: 8-digits & "D" at the end). When reading csv with .option("inferSchema", "true"), it returns the data typ ...
Requirements: I wanted to create a dataframe out of one column (existing dataframe ). That column value is multiple json list. Problem: Since the j ...
I have the following pyspark dataframe df_model: id_client id_sku 1111 4444 1111 ...
I am seeing something called as DataFilter in my query execution plan: There is a PartitionFilters: [] PushedFilters: [IsNotNull(product_id)] ...