简体   繁体   中英

specify conditions from outer query on a materialized subquery

i have got the below query which references couple of views 'goldedRunQueries' and 'currentGoldMarkings'. My issue seems to be from the view that is referred in the subquery - currentGoldMarkings . While execution, MySQL first materializes this subquery and then implements the where clauses of 'queryCode' and 'runId', which therefore results in execution time of more than hour as the view refers tables that has got millions of rows of data. My question is how do I enforce those two where conditions on the subquery before it materializes.

SELECT  goldedRunQueries.queryCode, goldedRunQueries.runId
    FROM  goldedRunQueries
    LEFT OUTER JOIN  
      ( SELECT  measuredRunId, queryCode, COUNT(resultId) as c
            FROM  currentGoldMarkings
            GROUP BY  measuredRunId, queryCode
      ) AS accuracy  ON accuracy.measuredRunId = goldedRunQueries.runId
      AND  accuracy.queryCode = goldedRunQueries.queryCode
    WHERE  goldedRunQueries.queryCode IN ('CH001', 'CH002', 'CH003')
      and  goldedRunQueries.runid = 5000
    ORDER BY  goldedRunQueries.runId DESC, goldedRunQueries.queryCode;

Here are the two views. Both of these also get used in a standalone mode and so integrating any clauses into them is not possible.

CREATE VIEW currentGoldMarkings
AS
SELECT  result.resultId, result.runId AS measuredRunId, result.documentId,
        result.queryCode, result.queryValue AS measuredValue,
        gold.queryValue AS goldValue,
        CASE result.queryValue WHEN gold.queryValue THEN 1 ELSE 0 END AS correct
    FROM  results AS result
    INNER JOIN  gold  ON gold.documentId = result.documentId
      AND  gold.queryCode = result.queryCode
    WHERE  gold.isCurrent = 1 

CREATE VIEW goldedRunQueries
AS
SELECT  runId, queryCode
    FROM  runQueries
    WHERE  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  runs
            WHERE  (runId = runQueries.runId)
              AND  (isManual = 0)
      )
      AND  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  results
            WHERE  (runId = runQueries.runId)
              AND  (queryCode = runQueries.queryCode)
              AND  EXISTS 
              ( SELECT  1 AS Expr1
                    FROM  gold
                    WHERE  (documentId = results.documentId)
                      AND  (queryCode = results.queryCode)
              )
      ) 

Note: The above query reflects only a part of my actual query. There are 3 other left outer joins which are similar in nature to the above subquery which makes the problem far more worse.

EDIT: As suggested, here is the structure and some sample data for the tables

CREATE TABLE `results`(
`resultId` int auto_increment NOT NULL,
`runId` int NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
 CONSTRAINT `PK_results` PRIMARY KEY 
(
`resultId`
)
);


insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (100, 242300, 'AC001', 'S', NULL)
insert into results values (150, 242301, 'AC005', 'I', 'abc')
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (109, 242301, 'PQ001', 'S', 'zzz')
insert into results values (400, 242400, 'DD006', 'I', NULL)



CREATE TABLE `gold`(
`goldId` int auto_increment NOT NULL,
`runDate` datetime NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
`isCurrent` tinyint(1) NOT NULL DEFAULT 0,
CONSTRAINT `PK_gold` PRIMARY KEY 
(
`goldId`
)
);



insert into gold values ('2015-02-20 00:00:00', 138904, 'CH001', 'N', NULL, 1)
insert into gold values ('2015-05-20 00:00:00', 138904, 'CH001', 'N', 'aaa', 1)
insert into gold values ('2016-02-20 00:00:00', 138905, 'CH002', 'N', NULL, 0)
insert into gold values ('2015-12-12 00:00:00', 138804, 'CH001', 'N', 'zzzz', 1)



CREATE TABLE `runQueries`(
`runId` int NOT NULL,
`queryCode` char(5) NOT NULL,
CONSTRAINT `PK_runQueries` PRIMARY KEY 
(
`runId`,
`queryCode`
)
);


insert into runQueries values (100, 'AC001')
insert into runQueries values (109, 'PQ001')
insert into runQueries values (400, 'DD006')



CREATE TABLE `runs`(
`runId` int auto_increment NOT NULL,
`runName` varchar(63) NOT NULL,
`isManual` tinyint(1) NOT NULL,
`runDate` datetime NOT NULL,
`comment` varchar(1023) NULL,
`folderName` varchar(63) NULL,
`documentSetId` int NOT NULL,
`pipelineVersion` varchar(50) NULL,
`isArchived` tinyint(1) NOT NULL DEFAULT 0,
`pipeline` varchar(50) NULL,
CONSTRAINT `PK_runs` PRIMARY KEY 
(
`runId`
)
);


insert into runs values ('test1', 0, '2015-08-04 06:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test2', 1, '2015-12-04 12:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test3', 1, '2015-06-24 10:56:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test4', 1, '2016-05-04 11:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)

First, let's try to improve the performance via indexes:

results: INDEX(runId, queryCode) -- in either order gold: INDEX(documentId, query_code, isCurrent) -- in that order

After that, update the CREATE TABLEs in the question and add the output of:

EXPLAIN EXTENDED SELECT ...;
SHOW WARNINGS;

What version are you running? You effectively have FROM ( SELECT ... ) JOIN ( SELECT ... ) . Before 5.6, neither subquery had an index; with 5.6, an index is generated on the fly.

It is a shame that the query is built that way, since you know which one to use: and goldedRunQueries.runid = 5000 .

Bottom Line: add the indexes; upgrade to 5.6 or 5.7; if that is not enough, then rethink the use of VIEWs .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM