简体   繁体   中英

How to make a GROUP BY SUBSTRING query faster?

I have a table with the following structure:

id bigNumber           text
1  1200321030011010233 "an item with some text"
2  1200321030011014563 "another item with some more text"
3  3120323434432211133 "more...."
.
.
.

The table contains approximately 50,000 records. I want to do the following query but it is slow:

  SELECT COUNT(*), 
         bigNumber 
    FROM items 
GROUP BY substr(bigNumber, 1, X)

where X varies from 2 and 19.

Would it be quicker to create 19 columns with parts of the number in each column to be able to use an index?

I would suggest instead of 19 separate columns, perhaps just 3 or 4, like so:

alter table items
add column bignumber_l1 char(1)
, add column bignumber_l3 varchar(3)
, add column bignumber_l6 varchar(6);

update items
set bignumber_l1 = left(bignumber, 1)
 , bignumber_l3 = left(bignumber, 3)
 , bignumber_l6 = left(bignumber, 6);

alter table items
add index bignumber_l1
, add index bignumber_l3
, add index bignumber_l6;

then when you are querying for strings of x length, write the query with the longest match without going longer:

  SELECT COUNT(*), 
         bigNumber 
    FROM items 
GROUP BY bignumber_l3, substr(bigNumber, 1, 4)

this query can use an index and might improve your performance significantly. note that since you're asking for the whole table, mysql may decide you need a table scan even with the indexes above so you may need to use FORCE INDEX .

You can use an index without adding any columns, just create an index on your bigNumber column:

create index bignum_index on items (bigNumber);

The index on a varchar (bigNumber is a varchar, right?) can be used to look up any string prefix.

That said, you'll need to do a full table scan to answer your query, so an index won't help you all that much.

I think the result you are looking for is LIKE _X%. This will not use the index though.

SELECT count(*) FROM items WHERE bignumber LIKE "_2%"
SELECT count(*) FROM items WHERE bignumber LIKE "_19%"

_ signifies one character

% signifies any number of characters

See MySQL docs for more information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM