简体   繁体   中英

how to cache a subset for cascading select queries in mysql

heres another database problem I stubled upon.

I have a date-range partitioned Myisam lookup table with 200M records and ~150 columns. On this Table I need to perform cascading SELECT-Statements to filter the data. Output:

  1. filter 126M

  2. filter 110M

  3. filter 40M

  4. filter 5M

  5. filter 100k

Every single SELECT is highly complex with regex (=no index possible) and multiple comparisons, which is why I want them to query the least amount of rows possible.

There are about 500 unique filters and around 200 constant users. Every filter needs to be run for each user, in total around 100k combinations.

Big question: Is there a way for each subsequent SELECT statement to query only the previous subset?

Example: Filter #5 should only have to query the 5M rows out of query 4 to get those 100k results. At the moment it has to scan through all 200M records.

EDIT current approach: cache table

CREATE TABLE IF NOT EXISTS cache 
(  filter_id int(11) NOT NULL,  
user_id int(11) NOT NULL,  
lookup_id int(11) NOT NULL 
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

ALTER TABLE cache ADD PRIMARY KEY (filter_id,user_id);

This would contain the relation between individual data-rows from the lookup table and the filters. PLUS I'd be able to use the primary index to get all of the lookup_ids from the previous filter.

Query for subsequent filters:

SELECT SUM( column), COUNT(*) 
FROM cache c 
LEFT JOIN lookup_table l 
ON c.lookup_id= l.id 
WHERE c.filter_id = 1 
AND c. user_id= x 
AND l.regex_column = preg_rlike...

May be you should save primary key of selected records to a some kind of temporary table? On next step join that temp table with your main table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM