简体   繁体   中英

Efficient way of phrasing multiple tuple pair WHERE conditions in SQL statement

I want to perform an SQL query that is logically equivalent to the following:

DELETE FROM pond_pairs
WHERE
  ((pond1 = 12) AND (pond2 = 233)) OR
  ((pond1 = 12) AND (pond2 = 234)) OR
  ((pond1 = 12) AND (pond2 = 8)) OR
  ((pond1 = 13) AND (pond2 = 6547)) OR
  ((pond1 = 13879) AND (pond2 = 6))

I will have hundreds of thousands pond1 - pond2 pairs. I have an index on (pond1, pond2) .

My limited SQL knowledge came up with several approaches:

  1. Run the whole query as is.
  2. Batch the query up into smaller queries with n WHERE conditions
  3. Save the pond1 - pond2 pairs into a new table, and do a subquery in the WHERE clause to identify
  4. Convert the python logic which identifies rows to delete into a stored procedure. Note that I am unfamiliar with programming stored procedures and thus this would probably involve a steep learning curve.

I am using postgres if that is relevant.

I will do 3. (with JOIN rather than subquery) and measure time of DELETE query (without creating table and inserting). This is good starting point, because JOINing is very common and optimized procedure, so It will be hard to beat that time. Then you can compare that time to your current approach.

Also you can try following approach:

  1. Sort pairs in same way as in index.
  2. Delete using method 2. from your description (probably in single transaction).

Sorting before delete will give better index reading performance, because there's greater chance for hard-drive cache to work.

For a large number of pond1-pond2 pairs to be deleted in a single DELETE, I would create temporary table and join on this table.

-- Create the temp table:
CREATE TEMP TABLE foo AS SELECT * FROM (VALUES(1,2), (1,3)) AS sub (pond1, pond2);

-- Delete
DELETE FROM bar 
USING  
  foo -- the joined table
WHERE 
  bar.pond1= foo.pond1 
AND 
  bar.pond2 = foo.pond2;

With hundred of thousands of pairs, you cannot do 1 (run the query as is), because the SQL statement would be too long.

3 is good if you have the pairs already in a table. If not, you would need to insert them first. If you do not need them later, you might just as well run the same amount of DELETE statements instead of INSERT statements.

How about a prepared statement in a loop, maybe batched (if Python supports that)

  1. begin transaction
  2. prepare statement "DELETE FROM pond_pairs WHERE ((pond1 = ?) AND (pond2 = ?))"
  3. loop over your data (in Python), and run the statement with one pair (or add to batch)
  4. commit

Where are the pairs coming from? If you can write a SELECT statements to identify them, you can just move this condition into the WHERE clause of your delete.

DELETE FROM pond_pairs WHERE (pond1, ponds) in (SELECT pond1, pond2 FROM ......  )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM