简体   繁体   English

从物化子查询的外部查询指定条件

[英]specify conditions from outer query on a materialized subquery

i have got the below query which references couple of views 'goldedRunQueries' and 'currentGoldMarkings'. 我有以下查询引用了几个视图“ goldedRunQueries”和“ currentGoldMarkings”。 My issue seems to be from the view that is referred in the subquery - currentGoldMarkings . 我的问题似乎是从子查询currentGoldMarkings中引用的观点出发的。 While execution, MySQL first materializes this subquery and then implements the where clauses of 'queryCode' and 'runId', which therefore results in execution time of more than hour as the view refers tables that has got millions of rows of data. 在执行时,MySQL首先实现此子查询,然后实现“ queryCode”和“ runId”的where子句,因此,由于视图引用具有数百万行数据的表,因此执行时间超过一个小时。 My question is how do I enforce those two where conditions on the subquery before it materializes. 我的问题是,如何在子查询实现之前在这两个条件上强制执行条件。

SELECT  goldedRunQueries.queryCode, goldedRunQueries.runId
    FROM  goldedRunQueries
    LEFT OUTER JOIN  
      ( SELECT  measuredRunId, queryCode, COUNT(resultId) as c
            FROM  currentGoldMarkings
            GROUP BY  measuredRunId, queryCode
      ) AS accuracy  ON accuracy.measuredRunId = goldedRunQueries.runId
      AND  accuracy.queryCode = goldedRunQueries.queryCode
    WHERE  goldedRunQueries.queryCode IN ('CH001', 'CH002', 'CH003')
      and  goldedRunQueries.runid = 5000
    ORDER BY  goldedRunQueries.runId DESC, goldedRunQueries.queryCode;

Here are the two views. 这是两个视图。 Both of these also get used in a standalone mode and so integrating any clauses into them is not possible. 这两种方法都可以在独立模式下使用,因此无法将任何子句集成到其中。

CREATE VIEW currentGoldMarkings
AS
SELECT  result.resultId, result.runId AS measuredRunId, result.documentId,
        result.queryCode, result.queryValue AS measuredValue,
        gold.queryValue AS goldValue,
        CASE result.queryValue WHEN gold.queryValue THEN 1 ELSE 0 END AS correct
    FROM  results AS result
    INNER JOIN  gold  ON gold.documentId = result.documentId
      AND  gold.queryCode = result.queryCode
    WHERE  gold.isCurrent = 1 

CREATE VIEW goldedRunQueries
AS
SELECT  runId, queryCode
    FROM  runQueries
    WHERE  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  runs
            WHERE  (runId = runQueries.runId)
              AND  (isManual = 0)
      )
      AND  EXISTS 
      ( SELECT  1 AS Expr1
            FROM  results
            WHERE  (runId = runQueries.runId)
              AND  (queryCode = runQueries.queryCode)
              AND  EXISTS 
              ( SELECT  1 AS Expr1
                    FROM  gold
                    WHERE  (documentId = results.documentId)
                      AND  (queryCode = results.queryCode)
              )
      ) 

Note: The above query reflects only a part of my actual query. 注意:以上查询仅反映了我实际查询的一部分。 There are 3 other left outer joins which are similar in nature to the above subquery which makes the problem far more worse. 还有3个其他左外部联接,它们在本质上与上述子查询相似,这使问题更加严重。

EDIT: As suggested, here is the structure and some sample data for the tables 编辑:根据建议,这是表的结构和一些示例数据

CREATE TABLE `results`(
`resultId` int auto_increment NOT NULL,
`runId` int NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
 CONSTRAINT `PK_results` PRIMARY KEY 
(
`resultId`
)
);


insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (100, 242300, 'AC001', 'S', NULL)
insert into results values (150, 242301, 'AC005', 'I', 'abc')
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (109, 242301, 'PQ001', 'S', 'zzz')
insert into results values (400, 242400, 'DD006', 'I', NULL)



CREATE TABLE `gold`(
`goldId` int auto_increment NOT NULL,
`runDate` datetime NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
`isCurrent` tinyint(1) NOT NULL DEFAULT 0,
CONSTRAINT `PK_gold` PRIMARY KEY 
(
`goldId`
)
);



insert into gold values ('2015-02-20 00:00:00', 138904, 'CH001', 'N', NULL, 1)
insert into gold values ('2015-05-20 00:00:00', 138904, 'CH001', 'N', 'aaa', 1)
insert into gold values ('2016-02-20 00:00:00', 138905, 'CH002', 'N', NULL, 0)
insert into gold values ('2015-12-12 00:00:00', 138804, 'CH001', 'N', 'zzzz', 1)



CREATE TABLE `runQueries`(
`runId` int NOT NULL,
`queryCode` char(5) NOT NULL,
CONSTRAINT `PK_runQueries` PRIMARY KEY 
(
`runId`,
`queryCode`
)
);


insert into runQueries values (100, 'AC001')
insert into runQueries values (109, 'PQ001')
insert into runQueries values (400, 'DD006')



CREATE TABLE `runs`(
`runId` int auto_increment NOT NULL,
`runName` varchar(63) NOT NULL,
`isManual` tinyint(1) NOT NULL,
`runDate` datetime NOT NULL,
`comment` varchar(1023) NULL,
`folderName` varchar(63) NULL,
`documentSetId` int NOT NULL,
`pipelineVersion` varchar(50) NULL,
`isArchived` tinyint(1) NOT NULL DEFAULT 0,
`pipeline` varchar(50) NULL,
CONSTRAINT `PK_runs` PRIMARY KEY 
(
`runId`
)
);


insert into runs values ('test1', 0, '2015-08-04 06:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test2', 1, '2015-12-04 12:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test3', 1, '2015-06-24 10:56:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test4', 1, '2016-05-04 11:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)

First, let's try to improve the performance via indexes: 首先,让我们尝试通过索引来提高性能:

results: INDEX(runId, queryCode) -- in either order gold: INDEX(documentId, query_code, isCurrent) -- in that order 结果:INDEX(runId,queryCode)-以任何顺序显示金:INDEX(documentId,query_code,isCurrent)-以该顺序显示

After that, update the CREATE TABLEs in the question and add the output of: 之后,更新问题中的CREATE TABLEs并添加以下内容的输出:

EXPLAIN EXTENDED SELECT ...;
SHOW WARNINGS;

What version are you running? 您正在运行什么版本? You effectively have FROM ( SELECT ... ) JOIN ( SELECT ... ) . 您实际上有FROM ( SELECT ... ) JOIN ( SELECT ... ) Before 5.6, neither subquery had an index; 在5.6之前,两个子查询都没有索引。 with 5.6, an index is generated on the fly. 对于5.6,动态生成索引。

It is a shame that the query is built that way, since you know which one to use: and goldedRunQueries.runid = 5000 . 以这种方式构建查询很可惜,因为您知道要使用哪个查询: and goldedRunQueries.runid = 5000

Bottom Line: add the indexes; 最底线:添加索引; upgrade to 5.6 or 5.7; 升级到5.6或5.7; if that is not enough, then rethink the use of VIEWs . 如果这还不够,请重新考虑使用VIEWs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM