简体   繁体   English

快速查询慢速创建表

[英]Fast to query slow to create table

样本数据

I have an issue on creating tables by using select keyword (it runs so slow). 我在使用select关键字创建表时遇到问题(运行速度如此之慢)。 The query is to take only the details of the animal with the latest entry date. 该查询将仅获取具有最新输入日期的动物的详细信息。 that query will be used to inner join another query. 该查询将用于内部联接另一个查询。

SELECT *
FROM amusementPart a 
INNER JOIN (
    SELECT DISTINCT name, type, cageID, dateOfEntry 
    FROM bigRegistrations
    GROUP BY cageID
) r ON a.type = r.cageID

But because of slow performance, someone suggested me steps to improve the performance. 但是由于性能低下,有人建议我采取措施来提高性能。 1) use temporary table, 2)store the result and use it and join it the the other statement. 1)使用临时表,2)存储结果并使用它,然后将其与另一个语句连接起来。

   use myzoo 
   CREATE TABLE animalRegistrations AS 
   SELECT DISTINCT name, type, cageID, MAX(dateOfEntry) as entryDate 
   FROM bigRegistrations
   GROUP BY cageID

unfortunately, It is still slow. 不幸的是,它仍然很慢。 If I only use the select statement, the result will be shown in 1-2 seconds. 如果仅使用select语句,则结果将在1-2秒内显示。 But if I add the create table, the query will take ages (approx 25 minutes) 但是,如果我添加了create table,查询将花费一些时间(大约25分钟)

Any good approach to improve the query time? 有什么好的方法可以缩短查询时间?

edit: the size of big registration table is around 3.5 million rows 编辑:大注册表的大小约为350万行

This is a multipart question. 这是一个多方面的问题。

  1. Use Temporary Table 使用临时表
  2. Don't use Distinct - group all columns to make distinct (dont forget to check for index) 不要使用Distinct-将所有列分组以使其区分(不要忘记检查索引)
  3. Check the SQL Execution plans 检查SQL执行计划

Can you please try the query in the way below to achieve The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query 您能否以以下方式尝试查询以实现The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query , the query you are using is not fetching records as per your requirement and it will faster: The query is to take only the details of the animal with the latest entry date. that query will be used to inner join another query ,您正在使用的查询未根据您的要求获取记录,并且它将更快:

    SELECT a.*, b.name, b.type, b.cageID, b.dateOfEntry
FROM amusementPart a
INNER JOIN bigRegistrations b ON a.type = b.cageID
INNER JOIN (SELECT c.cageID, max(c.dateOfEntry) dateofEntry 
                FROM bigRegistrations c 
            GROUP BY c.cageID) t ON t.cageID = b.cageID AND t.dateofEntry = b.dateofEntry

Suggested indexing on cageID and dateofEntry 建议在cageIDdateofEntry上建立索引

Here you are not creating a temporary table. 在这里,您没有创建临时表。 Try the following... 尝试以下...

CREATE TEMPORARY TABLE IF NOT EXISTS animalRegistrations AS 
SELECT name, type, cageID, MAX(dateOfEntry) as entryDate 
FROM bigRegistrations
GROUP BY cageID

Have you tried doing an explain to see how the plan is different from one execution to the next? 您是否尝试过进行解释,以查看该计划从一次执行到下一次执行有何不同?

Also, I have found that there can be locking issues in some DB when doing insert(select) and table creation using select. 另外,我发现在执行插入(选择)和使用选择创建表时,某些数据库中可能存在锁定问题。 I ran this in MySQL, and it solved some deadlock issues I was having. 我在MySQL中运行了它,它解决了我遇到的一些死锁问题。

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

The reason the query runs so slow is probably because it is creating the temp table based on all 3.5 million rows, when really you only need a subset of those, ie the bigRegistrations that match your join to amusementPart. 查询运行如此缓慢的原因可能是因为它基于所有350万行创建了一个临时表,而实际上您只需要其中的一个子集,即与您对娱乐部件的联接相匹配的bigRegistration。 The first single select statement is faster b/c SQL is smart enough to know it only needs to calculate the bigRegistrations where a.type = r.cageID. 第一个单选择语句更快b / c SQL非常聪明,足以知道它只需要计算a.type = r.cageID的bigRegistration。

I'd suggest that you don't need a temp table, your first query is quite simple. 我建议您不需要临时表,您的第一个查询非常简单。 Rather, you may just need an index. 相反,您可能只需要一个索引。 You can determine this manually by studying the estimated execution plan, or running your query in the database tuning advisor. 您可以通过研究估计的执行计划或在数据库优化顾问中运行查询来手动确定。 My guess is you need to create an index similar to below. 我的猜测是您需要创建一个类似于下面的索引。 Notice I index by cageId first since that is what you join to amusementParks, so that would help SQL narrow the results down the quickest. 请注意,我首先按agerId进行索引,因为这是您加入creationParks的内容,因此这将有助于SQL尽快缩小结果范围。 But I'm guessing a bit - view the query plan or tuning advisor to be sure. 但是我有点猜测-请确保查看查询计划或调整顾问。

CREATE NONCLUSTERED INDEX IX_bigRegistrations ON bigRegistrations
(cageId, name, type, dateOfEntry)

Also, if you want the animal with the latest entry date, I think you want this query instead of the one you're using. 另外,如果您希望动物的输入日期最新,那么我想您要使用的不是此查询。 I'm assuming the PK is all 4 columns. 我假设PK都是4列。

SELECT name, type, cageID, dateOfEntry 
FROM bigRegistrations BR
WHERE BR.dateOfEntry =
    (SELECT MAX(BR1.dateOfEntry)
    FROM bigRegistrations BR1
    WHERE BR1.name = BR.name
    AND BR1.type = BR.type
    AND BR1.cageID = BR.cageID)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM