简体   繁体   中英

How to select a range of rows from a multiple column primary key?

I'm trying to chunk through rows in MySQL 5.5 and to do this I want to select a range between two primary keys (which I can get easily). This is trivial when the primary key is only one column. However, some of the tables I need to chunk through have multiple columns in the primary key, and I haven't figured out how to make this work in a single prepared statement.

Here's an example table with some data:

CREATE TABLE test (
  a INT UNSIGNED NOT NULL,
  b INT UNSIGNED NOT NULL,
  c INT UNSIGNED NOT NULL,
  d VARCHAR(255) DEFAULT '', -- various data columns
  PRIMARY KEY (a, b, c)
) ENGINE=InnoDB;

INSERT INTO test VALUES 
(1, 1, 1),
(1, 1, 2),
(1, 1, 3),
(1, 2, 1),
(1, 2, 2),
(1, 2, 3),
(1, 3, 1),
(1, 3, 3),
(2, 1, 1),
(2, 1, 2),
(2, 2, 2),
(2, 3, 1),
(2, 3, 3),
(3, 1, 2),
(3, 1, 3),
(3, 2, 1),
(3, 2, 2),
(3, 2, 3),
(3, 3, 1),
(3, 3, 3);

If I had two primary keys like (1, 1, 3) and (3, 2, 1), the following statement would work. a1, b1, and c1 are the values from the first primary key, and a2, b2, and c2 are the values from the second primary key:

SELECT * FROM test WHERE a = a1 AND b = b1 AND c >= c1
UNION
SELECT * FROM test WHERE a = a1 AND b > b1
UNION
SELECT * FROM test WHERE a > a1 AND a < a2
UNION
SELECT * FROM test WHERE a = a2 AND b < b2
UNION
SELECT * FROM test WHERE a = a2 AND b = b2 AND c <= c2

Or

SELECT * FROM test WHERE a = 1 AND b = 1 AND c >= 3
UNION
SELECT * FROM test WHERE a = 1 AND b > 1
UNION
SELECT * FROM test WHERE a > 1 AND a < 3
UNION
SELECT * FROM test WHERE a = 3 AND b < 2
UNION
SELECT * FROM test WHERE a = 3 AND b = 2 AND c <= 1

Which gives

(1, 1, 3),
(1, 2, 1),
(1, 2, 2),
(1, 2, 3),
(1, 3, 1),
(1, 3, 3),
(2, 1, 1),
(2, 1, 2),
(2, 2, 2),
(2, 3, 1),
(2, 3, 3),
(3, 1, 2),
(3, 1, 3),
(3, 2, 1)

But the above fails when the first column is the same, eg (1, 2, 2) and (1, 3, 1). In this case, the 2nd and 4th SELECT select too much.

SELECT * FROM test WHERE a = 1 AND b = 2 AND c >= 2
UNION
SELECT * FROM test WHERE a = 1 AND b > 2
UNION
SELECT * FROM test WHERE a > 1 AND a < 1
UNION
SELECT * FROM test WHERE a = 1 AND b < 3
UNION
SELECT * FROM test WHERE a = 1 AND b = 3 AND c <= 1

Which gives

(1, 1, 1), -- erroneously selected from: SELECT * FROM test WHERE a = 1 AND b < 3
(1, 1, 2), -- erroneously selected from: SELECT * FROM test WHERE a = 1 AND b < 3
(1, 1, 3), -- erroneously selected from: SELECT * FROM test WHERE a = 1 AND b < 3
(1, 2, 1), -- erroneously selected from: SELECT * FROM test WHERE a = 1 AND b < 3
(1, 2, 2),
(1, 2, 3),
(1, 3, 1),
(1, 3, 3)  -- erroneously selected from: SELECT * FROM test WHERE a = 1 AND b > 2

The desired output is

(1, 2, 2),
(1, 2, 3),
(1, 3, 1)

I would like a single statement that works with all primary key ranges, including identical values for the first and second columns. I also have tables with 4 columns in the primary key, and I'll extend the pattern in that case.

I would like a single statement per table instead of creating queries on the fly because the query will be executed up to a million times as I chunk through the tables. Some of the tables have over 100M rows.

I would rather avoid constructing multiple statements as I have hundreds to write following this pattern, and writing more would be significantly more work. I will do this if it's the only option.

I currently use parametrized queries, and generate the values programmatically from the two primary keys, taking care of required duplicate values (the a1 x3, b1 x2, a2 x3, b2 x2 in the above example) in the application layer. So passing duplicate values for parameters is simple for me to do.

My best guess at this point is duplicate the SELECTs with an additional part of the WHERE clause comparing the values of the columns of the primary keys.

I would use this query to select a range:

SELECT * 
FROM test
WHERE (a,b,c) >= (1, 1, 3) 
  and (a,b,c) <= (3, 2, 1)

Demo: http://www.sqlfiddle.com/#!2/d6cf7b/4


Unfortunately, MySql is not able to perform a range optimalization for the above query, see this link: http://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#range-access-single-part
(chapter: 8.2.1.3.4. Range Optimization of Row Constructor Expressions)
They say that starting from verion 5.7 MySql can optimize only queries of a form:

WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));



Basically the above query is equivalent to this one:

SELECT * 
FROM test
WHERE  
     a = 1 and b = 1 and c >= 3 -- lowest end
     or 
     a = 3 and b = 2 and c <= 1 -- highest end
     or 
     a = 1 and b > 1
     or
     a = 3 and b < 2
     or 
     a  > 1 and a < 3
;

MySql might use a range access method optimalization for this form of the query, see below link
(chapter :8.2.1.3.2. The Range Access Method for Multiple-Part Indexes):
http://dev.mysql.com/doc/refman/5.7/en/range-optimization.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM