Tag[apache-hudi] Recent Newest Questions

Spark Streaming HUDI HoodieException: Config conflict(key current value existing value): RecordKey:

As I am connecting to the kafka topic with spark and creating the dataframe and then storing into Hudi: I am getting the following exception: To ...

Creating an Athena view on a HUDI table returns soft deleted records when the view is read using SPARK

I have multiple HUDI tables with differing column names and I built a view on top of it to standardize the column names. When this view is read from A ...

Deleting records from Apache Hudi Table which is part of Glue Tables created using AWS Glue Job and Kinesis

I currently have a DynamoDB stream configured which is inputing streams into Kinesis Data streams whenever insertion/updation happens and subsequent ...

Apache Hudi Upsert/Insert/Deletes at the same time

Can we run write operation type Upsert and Delete at the same time and same table? Is Apache Hudi meta get corrupted?? Please help here to do the sa ...

Hudi with Spark perform very slow when trying to write data into filesystem

I'm trying Apache Hudi with Spark by a very simple demo: There are about 10 parquet files in the directory; their total size is 1GB, about 6 millio ...

How to encrypt apache hudi external tables data present in s3 synced into hive tables through spark jobs

Technical background: I am getting tables data from kafka and putting it into hudi and hive tables using spark. I am using AWS EMR. I want to encrypt ...

Hudi overwriting the tables with back date data

I am pushing some initial bulk data into a hudi table, and then every day, I write incremental data into it. But if back data arrives, then the latest ...

Apache Hudi create and append Upsert table (Parquet-format) on Dataproc & Cloud Storage

is Dataproc-noob again. My main goal is to ingest the tables from on-premise sources, store them as a Parquet-file in a Cloud Storage bucket and crea ...

Apache Hudi on Dataproc

Is there any guide to deploy Apache Hudi on a Dataproc Cluster? i´m trying to deploy via Hudi Quick Start Guide but i can´t. Spark 3.1.1 Python 3.8. ...

why I can't insert datagen in flink?

...

How to set custom hudi field for _hoodie_commit_time metadata column?

Hudi by default basing ingestion timeline on current time. I want to change this behavior and use my own datetime field during the ingestion. I want t ...

Error to write hudi table into minio s3 bucket by flink SQL

The Problem I'm trying to write a hudi table into minio s3 bucket by flink SQL, but it fails. The hudi table is created, but only contains meta data ...

How to insert struct, map type in Apache Hudi

I see the official document, there are no samples about inserting complex types like struct and map. So, what's the grammar? My table definition: s ...

How to remove 'before' key from payload generated by debezium event for updates in SQL server

For every update in SQL server, debezium generates event payload with 'after' and 'before'. I want to get rid of 'before' without flattening the paylo ...

Pyspark streaming from Kafka to Hudi

I'm new using hudi and I have a problem. I'm working with an EMR in AWS with pyspark, Kafka and what I want to do is to read a topic from the Kafka cl ...

How to exclude either files or folder paths on S3 within an AWS Glue job when reading an Athena table?

We have an AWS Glue job that is attempting to read data from an Athena table that is being populated by HUDI. Unfortunately, we are running into an er ...

Can I use incremental, time travel, and snapshot queries with hudi only using spark-sql?

I'm trying to do incremental, snapshot, and time travel queries using spark-sql with hudi, but the only way that I can find to do this is creating a D ...

How to add Hudi Package to local AWS Glue Interactive Notebook

I have setup Glue Interactive sessions locally by following https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html However, I am not abl ...

org.apache.flink.table.api.TableException: Unsupported query: Merge Into

I am working on a Flink streaming job where I need to upsert data in the Hudi table. I am using merge into a query to upsert data in the Hudi table. ...

Can Apache Hudi be used to upsert a row from Apache Spark dataframe into Postgres database?

Problem Statement: There is no upsert to database feature in Apache Spark, instead we have to overwrite the entire table. But Apache Hudi can be used ...