I have a drataframe shown in below format with records as json data (which is in string format) read from kafka topic I need to write just the json ...
I have a drataframe shown in below format with records as json data (which is in string format) read from kafka topic I need to write just the json ...
As I am connecting to the kafka topic with spark and creating the dataframe and then storing into Hudi: I am getting the following exception: To ...
We have Cassandra table person, and Dataframe is, In Spark We wanted to save dataframe to table , where dataframe is having multiple records for ...
I am currently working on a little project where I stream machine data (JSON format) from a kafka topic for further analysis. The JSON from the colum ...
I want cluster a streaming dataset using Spark. I first tried to use Kmeans but it throws a runtime exception on calling fit method saying it cannot b ...
The PySpark SQL functions reference on the row_number() function says returns a sequential number starting at 1 within a window partition imply ...
I have a spark job that is composed as fellows: 1- read static dataFrame from Delta Lake. 2- read a stream of dataFrame from Delta Lake. 3- join th ...
I am trying to read a json message from kafka topic using spark streaming using a custom schema, I can see data is coming when I am using Cast value a ...
Supposed we have an application that reads from X partition topic, does some filtering on the data then saves it into storage (no complex shuffling lo ...
Exploring PySpark Structured Streaming and databrick. I want to write a spark structural streaming job to read all the data from a kafka topic and pub ...
I have a Spark streaming job triggered every day using Trigger.Once method due to business requirements. StreamingQuery query = processed ...
I have a PySpark streaming pipeline which reads data from a Kafka topic, data undergoes thru various transformations and finally gets merged into a da ...
I'm implementing a Spark Structured Streaming job where I'm consuming messages coming from Kafka in JSON format. Since the json data is dynamic I do ...
I have a Structured Streaming job which has got Kafka as source and Delta as sink. Each of the batches will be processed inside a foreachBatch. The p ...
Assume that I have a streaming delta table in Databricks. Is there any way to get a snapshot of the streaming table as a static table? Reason is that ...
I am using Spark2.3.0 and kafka1.0.0.3. I have created a spark read stream It runs succesfully and then But when I run this It throws me an e ...
I hope everyone is doing well. I have a long question therefore please bear with me. Context: So I have CDC payloads coming from the Debezium connect ...
I am Running spark in cluster mode which is giving error as I ran below command and verified that jks files are present at the location. I have ...
I'm trying to limit the input rate for a Structured Streaming query using a maximum record count. However, the documentation says only maxFilesPerTrig ...
I do window based aggregation with watermark, but everytime all of the data getting aggregated. relevant code: Once query started I start to put f ...