简体   繁体   中英

HIve/Hadoop/Spark SQL ACID Transformations - How to Delete from table_a where table_a.id = table_b.id

SQL novice here trying to perform a delete operation using Hive syntax and ACID transformations I have two delta files that I have brought in as dataframes in Databricks. Table_A and Table_B

Here's what's failing:

DELETE FROM Table_A WHERE Table_A.id = Table_B.id

Here's the error I get back AnalysisException: cannot resolve ' Table_B.id ' given input columns: [];

Table_B is a valid dataframe that is loaded into memory at the time of the query, and does have a schema with a single column ('id'). This error leads me to believe I'm not providing enough context and am failing to introduce Table B into the query correctly

I've read on here that I could possibly insert the rest of of rows (ie, the ones I want to keep) into another table and then drop the old table, but I'm not sure how to do that

You can do IN subquery to get the Table_B id's:

DELETE FROM Table_A WHERE Table_A.id IN (select Table_B.id from Table_B)? 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM