简体   繁体   English

在分页表中显示大量数据而不会严重影响DB

[英]Displaying a large amount of data in paging table without heavily impacting DB

The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. 当前的实现是一个具有多个连接和临时表的复杂查询,但是对我的MySQL施加了太大的压力,并且加载表需要超过30秒。 The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. PHP通过JavaScript Ajax调用检索数据并显示在网页上。 Here is the tables involved: 以下是涉及的表格:

Table: table_companies
Columns: company_id, ...

Table: table_manufacture_line
Columns: line_id, line_name, ...

Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...

Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...

A single company can have 100k+ items split between the two product tables. 一家公司可以在两个产品表之间分配100k +项目。 The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. 产品表由line_name联合和过滤,然后按assembly_datetime排序,并根据分页进行限制。 The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). 日期时间值也依赖于时区,这将作为查询的一部分应用(另一个JOIN +临时表)。 line_name is also one of the returned columns. line_name也是返回列之一。

I was thinking of splitting the line_name filter out from the product union query. 我在考虑从产品联合查询中拆分line_name过滤器。 Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>) . 本质上,我确定与过滤器对应的行的ID,然后使用WHERE条件WHERE line_id IN (<results from previous query>)执行UNION查询。 This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things. 这将减少对连接和临时表的需求,我可以在PHP中将line_name应用于line_id和时区修改,但我不确定这是解决问题的最佳方法。

I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables. 我也看过可能使用Redis,但是大量的单个产品在通过PHP将所有数据推送到Redis(20-30秒)时导致同样长的等待时间,即使它只是直接从产品表。

  • Is it possible to tweak the existing queries to increase the efficiency? 是否可以调整现有查询以提高效率?
  • Can I push some of the handling to PHP to decrease the load on the SQL server? 我可以将一些处理推送到PHP以减少SQL服务器上的负载吗? What about Redis? Redis怎么样?
  • Is there a way to architect the tables better? 有没有办法更好地构建表格?
  • What other solution(s) would you suggest? 您会建议哪些其他解决方案?

I appreciate any input you can provide. 我感谢您提供的任何输入。

Edit: 编辑:

Existing query: 现有查询:

SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100

Here it is formatted for some readability. 这里的格式是为了一些可读性。

SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,... 
  FROM (SELECT line_name,datetime,... 
          FROM (SELECT line_id,assembly_datetime datetime,... 
                    FROM table_product_stereos WHERE company_id=# 

                 UNION 
                SELECT line_id,assembly_datetime datetime,... 
                  FROM table_product_televisions 
                 WHERE company_id=# 
               ) AS union_products 
         INNER JOIN table_manufacture_line USING (line_id)
        ) AS products 
INNER JOIN (SELECT timezone 
            FROM table_companies 
            WHERE company_id=# 
            ) AS tz 
ORDER BY datetime DESC LIMIT 0,100

IDs are indexed; ID已编入索引; Primary keys are the first key for each column. 主键是每列的第一个键。

Let's build this query up from its component parts to see what we can optimize. 让我们从其组成部分构建此查询,以查看我们可以优化的内容。

Observation: you're fetching the 100 most recent rows from the union of two large product tables. 观察:您从两个大型产品表的并集中获取100个最新行。

So, let's start by trying to optimize the subqueries fetching stuff from the product tables. 因此,让我们首先尝试优化从产品表中提取东西的子查询。 Here is one of them. 这是其中之一。

              SELECT line_id,assembly_datetime datetime,... 
                FROM table_product_stereos 
               WHERE company_id=#

But look, you only need the 100 newest entries here. 但是看,你只需要这里有100个最新的条目。 So, let's add 所以,让我们补充一下

               ORDER BY assembly_datetime DESC
               LIMIT 100

to this query. 这个查询。 Also, you should put a compound index on this table as follows. 此外,您应该在此表上放置一个复合索引,如下所示。 This will allow both the WHERE and ORDER BY lookups to be satisfied by the index. 这将允许索引满足WHERE和ORDER BY查找。

 CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)

All the same considerations apply to the query from table_product_televisions . 所有相同的注意事项都适用于table_product_televisions的查询。 Order it by the time, limit it to 100, and index it. 按时间排序,将其限制为100,并将其编入索引。

If you need to apply other selection criteria, you can put them in these inner queries. 如果需要应用其他选择条件,可以将它们放在这些内部查询中。 For example, in a comment you mentioned a selection based on a substring search. 例如,在评论中,您提到了基于子字符串搜索的选择。 You could do this as follows 您可以按如下方式执行此操作

              SELECT t.line_id,t.assembly_datetime datetime,... 
                FROM table_product_stereos AS t
                JOIN table_manufacture_line AS m   ON m.line_id = t.line_id 
                                                  AND m.line_name LIKE '%test'
               WHERE company_id=#
               ORDER BY assembly_datetime DESC
               LIMIT 100

Next, you are using UNION to combine those two query result sets into one. 接下来,您将使用UNION将这两个查询结果集合并为一个。 UNION has the function of eliminating duplicates, which is time-consuming. UNION具有消除重复的功能,这非常耗时。 (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead. (你知道你没有重复项,但MySQL却没有。)请改用UNION ALL

Putting this all together, the innermost sub query becomes this. 把这一切放在一起,最里面的子查询就变成了这个。 We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level. 我们必须包装子查询,因为SQL在同一查询级别被UNIONORDER BY子句混淆。

           SELECT * FROM (
              SELECT line_id,assembly_datetime datetime,... 
                FROM table_product_stereos 
               WHERE company_id=#
               ORDER BY assembly_datetime DESC 
               LIMIT 100
                         ) AS st
           UNION ALL 
           SELECT * FROM (
             SELECT line_id,assembly_datetime datetime,... 
               FROM table_product_televisions 
              WHERE company_id=#
              ORDER BY assembly_datetime DESC 
              LIMIT 100
                         ) AS tv

That gets you 200 rows. 这会让你200行。 It should get those rows fairly quickly. 它应该很快得到那些行。

200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. 保证200行足以在您执行外部ORDER BY ... LIMIT操作后为您提供100个最新项目。 But that operation only has to crunch 200 rows, not 100K+, so it will be far faster. 但是这个操作只需要处理200行,而不是100K +,所以它会快得多。

Finally wrap up this query in your outer query material. 最后在您的外部查询材料中包装此查询。 Join the table_manufacture_line information, and fix up the timezone. 加入table_manufacture_line信息,并修复时区。

If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast. 如果您之前执行索引和ORDER BY ... LIMIT操作,则此查询应该变得非常快。

The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. 问题中的注释对话框向我表明您可能有多种产品类型,而不仅仅是两种,并且您的分页显示具有复杂的选择条件。 Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently. 在大量行上使用UNION ALL实现性能:它将多个索引表转换为无法有效搜索的内部行列表。

You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. 您真的应该考虑将两种产品数据放在一个表中,而不是UNION ALL多个产品表。 The setup you have now is inflexible and won't scale up easily. 您现在拥有的设置不灵活,无法轻松扩展。 If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. 如果使用主产品表构建模式,并且可能使用某些属性表来构建特定于产品的信息,那么两年后您会发现自己更快乐。 Seriously. 认真。 Please consider making the change. 请考虑进行更改。

Remember: Index fast, data slow. 记住:索引速度快,数据速度慢。 Use joins over nested queries. 在嵌套查询上使用连接。 Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). 嵌套查询返回所有数据字段,而连接只考虑过滤器(应该全部编入索引 - 确保table_product _ *。line_id上​​有唯一索引)。 It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on. 已经有一段时间但我很确定你可以加入“ON company_id =#”,这应该会在早期减少结果。

In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable). 在这种情况下,所有结果都指向同一家公司(或更小的子集),因此单独运行该查询是有意义的(并且它使查询更易于维护)。

So your data source would be: 所以你的数据源是:

(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)

From which you can select prod. 您可以从中选择产品。 or ml. 或ml。 fields as required. 字段根据需要。

PHP is not a solution at all... Redis can be a solution. PHP根本不是解决方案...... Redis可以是一个解决方案。

But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. 但我要改变的主要是表的索引创建(添加丢失的索引)...如果你正在运行临时表,你没有很好地为表创建索引。 And 100k rows in not much at all. 而且几乎没有100k排。

But I cant help you without any table creation statements as well as queries you run. 但是,如果没有任何表创建语句以及您运行的查询,我无法帮助您。

Make sure your "where part" is part of youf btree index from left to right. 确保你的“where part”从左到右是你的btree索引的一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM