简体   繁体   English

使用MySQL报告-最简单的查询花费的时间太长

[英]Reporting with MySQL - Simplest Query taking too long

I have a MySQL Table on an Amazon RDS Instance with 250 000 Rows. 我在具有25万行的Amazon RDS实例上有一个MySQL表。 When I try to 当我尝试

SELECT * FROM  tableName 

without any conditions ( just for testing , the normal query specifies the columns I need, but I need most of them) , the query takes between 20 and 60 seconds to execute. 没有任何条件( 仅用于测试 ,普通查询指定了我需要的列,但我需要其中的大多数列),查询需要20到60秒的时间才能执行。 This will be the base query for my report, and the report should run in under 60 seconds, so I think this will not work out (it times out the moment I add the joins). 这将是我的报告的基本查询,并且该报告应在60秒内运行,因此我认为这不会奏效(在我添加联接时超时)。 The report runs without any problems in our smaller test environments. 该报告在我们较小的测试环境中运行没有任何问题。

Could it be that the Query is taking so long because MySQL is trying to lock the table and waiting for all writes to finish? 难道因为MySQL试图锁定表并等待所有写入完成而使查询花费了这么长时间? There might be quite a lot of writes on this table. 该表上可能有很多写东西。 I am doing the query on a MySQL slave, since I do not want to lockup the production system with my queries. 我正在对MySQL从站进行查询,因为我不想用查询锁定生产系统。

  • I have no experience with how much rows are much for a relational DB. 我对关系型数据库有多少行没有经验。 Are 250 000 Rows with ~30 columns (varchar, date and integer types) much? 带有〜30列(varchar,日期和整数类型)的25万行多少?
  • How can I speedup this query (hardware, software, query optimization ...) 我如何加快此查询的速度(硬件,软件,查询优化...)
  • Can I tell MySQL that I do not care that the Data might be inconsistent (It is a snapshot from a Reporting Database) 我可以告诉MySQL我不在乎数据是否不一致(这是报告数据库的快照)
  • Is there a chance that this query will run under 60 seconds, or do I have to adjust my goals? 此查询是否有可能在60秒内运行,还是我必须调整目标?

A table with 250,000 rows is not too big for MySQL at all. 对于MySQL,具有25万行的表根本不是很大。

However, waiting for those rows to be returned to the application does take time. 但是,等待这些行返回给应用程序确实需要时间。 That is network time, and there are probably a lot of hops between you and Amazon. 那是网络时间,您和亚马逊之间可能有很多跳跃。

Unless your report is really going to process all the data, check the performance of the database with a simpler query, such as: 除非您的报告真的要处理所有数据,否则请使用更简单的查询来检查数据库的性能,例如:

select count(*) from table;

EDIT: 编辑:

Your problem is unlikely to be due to the database. 您的问题不太可能是由于数据库造成的。 It is probably due to network traffic. 这可能是由于网络流量。 As mentioned in another answer, streaming might solve the problem. 如另一个答案中所述,流式传输可能会解决问题。 You might also be able to play with the data formats to get the total size down to something more reasonable. 您也可以使用数据格式将总大小减小到更合理的水平。

A last-resort step would be to save the data in a text file, compress the file, move it over, and uncompress it. 最后的步骤是将数据保存在文本文件中,压缩文件,将其移到上方并解压缩。 Although this sounds like a lot of work, you might get 5x - 10x compression on the data, saving oodles of time on the transmission and still have a large improvement in performance with the rest of the processing. 尽管这听起来需要做很多工作,但您可能会对数据进行5倍至10倍的压缩,从而节省了传输时间,并且在其余处理过程中,性能仍有很大提高。

Remember that MySQL has to prepare your result set and transport it to your client. 请记住,MySQL必须准备结果集并将其传输到客户端。 In your case, this could be 200MB of data it has to shuttle across the connection, so 20 seconds is not bad at all. 在您的情况下,这可能是200MB的数据,它必须在连接中穿梭,所以20秒一点也不差。 Most libraries, by default, wait for the entire result being received before forwarding it to the application. 默认情况下,大多数库都等待接收到整个结果,然后再将其转发给应用程序。

To speed it up, fetch only the columns you need, or do it in chunks with LIMIT . 为了加快速度,请获取所需的列,或者使用LIMIT对其进行分块处理。 SELECT * is usually a sign that someone's being super lazy and not optimizing at all. SELECT *通常是某人超级懒惰且根本没有优化的信号。

If your library supports streaming resultsets, use that, as then you can start getting data almost immediately. 如果您的库支持流式结果集,请使用它,这样您几乎可以立即开始获取数据。 It'll allow you to iterate on rows as they come in without buffering the entire result. 它使您可以在行进入时对其进行迭代,而无需缓冲整个结果。

I got updated specs from my client and was able to reduce the amount of users returned to 250, which goes (with a lot of JOINS) though in 60 seconds. 我从客户端获得了最新的规格,并且能够将返回的用户数量减少到250,尽管在60秒之内(有了很多JOINS)。

So maybe the answer is really: Try to not dump a whole table with a query, fetch only the exact data your need. 因此,答案可能是真的:尽量不要在查询中转储整个表,仅获取所需的确切数据。 The Client has SQL access, and he will have to update his queries, so only relevant users are returned. 客户端具有SQL访问权限,并且他将不得不更新其查询,因此仅返回相关用户。

I should never really use * as a wildcard. 我绝对不应该将*用作通配符。 Choose the fields that you actually want and then create an index of these fields combined. 选择您实际需要的字段,然后创建这些字段组合的索引。

If you have thousands of rows, another option is implement pagination. 如果您有成千上万的行,另一种选择是实现分页。 If result data directly using for report , no one can look more than 100 rows in single shot. 如果结果数据直接用于报告,则任何人一次查看的行数都不能超过100。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM