HIve/Hadoop/Spark SQL ACID Transformations - How to Delete from table_a where table_a.id = table_b.id

Question

SQL novice here trying to perform a delete operation using Hive syntax and ACID transformations I have two delta files that I have brought in as dataframes in Databricks. Table_A and Table_B

Here's what's failing:

DELETE FROM Table_A WHERE Table_A.id = Table_B.id

Here's the error I get back AnalysisException: cannot resolve ' Table_B.id ' given input columns: [];

Table_B is a valid dataframe that is loaded into memory at the time of the query, and does have a schema with a single column ('id'). This error leads me to believe I'm not providing enough context and am failing to introduce Table B into the query correctly

I've read on here that I could possibly insert the rest of of rows (ie, the ones I want to keep) into another table and then drop the old table, but I'm not sure how to do that

Answer 1

You can do IN subquery to get the Table_B id's:

DELETE FROM Table_A WHERE Table_A.id IN (select Table_B.id from Table_B)?

HIve/Hadoop/Spark SQL ACID Transformations - How to Delete from table_a where table_a.id = table_b.id

Question

1 answers

solution1
0 2023-01-19 14:30:05

HIve/Hadoop/Spark SQL ACID Transformations - How to Delete from table_a where table_a.id = table_b.id

Question

1 answers

solution1 0 2023-01-19 14:30:05

solution1
0 2023-01-19 14:30:05