I'm dealing with two SQL tables that contains 120,000,000 records each. Few records (approx 60,000) are duplicated across the two tables. Structure of both tables are the same.
There are 40 columns in each table. I need to union the records into one of the table.
I know of two ways to do it (both give me the desired output). I would like to know which ways is better and is there is a much better way please?
Method 1:
SELECT * INTO Table1_copy FROM Table1
DROP TABLE Table1
SELECT * INTO Table1 FROM Table1_copy
UNION
SELECT * FROM Table2
DROP TABLE Table1_copy
Method 2:
INSERT INTO Table1 <br>
SELECT Table2.Col1, Table2.Col2 <br>
FROM TAB1 <br>
FULL OUTER JOIN Table2 <br>
ON Table1.Col1 = Table2.Col1 AND Table1.Col2 = Table2.Col2 <br>
WHERE Table1.Col1 IS NULL AND Table1.Col2 IS NULL
Use of UNION
seem to be a better choice, but is anyone able to address the space issue around having to select large datasets into a new table and dropping it. 120,000,000 records is just one example. There are other tables with larger number of records.
I think I would do:
SELECT * INTO Table1
FROM Table1_copy;
CREATE INDEX idx_table1_copy_2 ON table1_copy(col1, col2);
INSERT INTO table1 (. . .)
SELECT *
FROM Table2 t2
WHERE NOT EXISTS (SELECT 1
FROM table1_copy t1
WHERE t1.col1 = t2.col1 AND t1.col2 = t2.col2
);
I should note that the two methods you describe are NOT equivalent. UNION
removes duplicates within tables and between tables, so the rows in the new table are all distinct. FULL OUTER JOIN
does not remove duplicates from within tables.
I don't understand why you use a third table.
I would go with something like this:
INSERT INTO Table1 (<Columns list>)
SELECT <Columns list> FROM Table2
EXCEPT
SELECT <Columns list> FROM Table1
If except isn't fast enough, perhaps use not exists and add the relevant indexes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.