简体   繁体   English

PHP-MYSQL SELECT中的查询时间长

[英]Long query time in PHP-MYSQL SELECT

I have a PHP script which is sending queries to an Amazon RDS instance using the mysqli method. 我有一个PHP脚本,该脚本使用mysqli方法将查询发送到Amazon RDS实例。 I'm noticing the below code is taking about a minute to execute. 我注意到以下代码需要大约一分钟的时间来执行。 I wanted to see where it was getting hung up. 我想看看它挂在哪里。

The table is very large - over 30 million rows. 该表非常大-超过3000万行。 It is about 8GB according to phpMyAdmin. 根据phpMyAdmin大约为8GB。 It is running on a db.r3.large RDS instance in the same availability zone and area as the webserver. 它运行在db.r3.large RDS实例上,该实例位于与Web服务器相同的可用区域和可用区域中。 I figure db.r3.large is overkill for this but wanted to make sure it wasn't an issue. 我认为db.r3.large对此有些大材小用,但想确保它不是问题。

My script does a search on usernames (whole or partial) and returns matches to a jQuery frontend. 我的脚本对用户名(全部或部分)进行搜索,并将匹配结果返回到jQuery前端。 Nothing is timing out - the client browser holds on "waiting for [sitename]..." then returns the timing info as well as the result. 什么都没有超时-客户端浏览器按住“正在等待[sitename] ...”,然后返回定时信息以及结果。 Results are generally in the vicinity of a dozen to a couple hundred matched rows. 结果通常在十几到几百个匹配的行附近。

Is the long execution time just due to the size of the database? 执行时间长是否仅取决于数据库的大小? Am I retrieving and processing the matches correctly? 我是否正确检索和处理了比赛?

When I run the query manually, phpMyAdmin makes my browser wait about the same time (a minute or so) with the yellow "Loading" box then returns the same matches, along with "Showing rows 0 - 8 (9 total, Query took 53.1656 sec)". 当我手动运行查询时,phpMyAdmin使我的浏览器等待大约相同的时间(一分钟左右),并带有黄色的“正在加载”框,然后返回相同的匹配项,以及“显示第0-8行(共9行,查询用了53.1656秒)”。

Here is my code: 这是我的代码:

$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $dbname);
$output = array();

if (mysqli_connect_errno()) {
  printf("Connect failed: %s\n", mysqli_connect_error());
  exit();
}

echo "Connected at " . getCurrentTime() . "<br><br>";

if ($result = $mysqli->query("SELECT * FROM tablename WHERE last_name LIKE \"%$query%\"")) {

echo "Loaded result at " . getCurrentTime() . "<br><br>";

$selected = $result->num_rows;

echo "Results ready at " . getCurrentTime() . "<br><br>";

while($row = $result->fetch_array(MYSQL_ASSOC)) {
  $output[] = $row;

  echo "Loaded into array at " . getCurrentTime() . "<br><br>";

/* close result set */
$result->close();

echo "Closed result at " . getCurrentTime() . "<br><br>";

}

} else {
  echo "No result at " . getCurrentTime() . "<br><br>";
}

/* close connection */
$mysqli->close();

echo "Closed mysqli at " . getCurrentTime() . "<br><br>";

Here is what my script is outputting: 这是我的脚本输出的内容:

>Started at Thu Aug 20 19:56:08 2015
>
>Connected at Thu Aug 20 19:56:08 2015
>
>Loaded result at Thu Aug 20 19:57:01 2015
>
>Results ready at Thu Aug 20 19:57:01 2015
>
>Loaded into array at Thu Aug 20 19:57:01 2015
>
>Closed result at Thu Aug 20 19:57:01 2015
>
>Closed mysqli at Thu Aug 20 19:57:01 2015

(The script then returns JSON encoded-object of results). (然后,脚本返回结果的JSON编码对象)。

I have access to the RDS console and phpMyAdmin for troubleshooting. 我可以访问RDS控制台和phpMyAdmin进行故障排除。

Your query is running long because it is not using indexes because of the wildcard and LIKE comparison. 由于通配符和LIKE比较,您的查询未使用索引,因此运行了很长时间。

LIKE "%$query%"

Read more here: http://dev.mysql.com/doc/refman/5.6/en/index-btree-hash.html 在此处阅读更多信息: http : //dev.mysql.com/doc/refman/5.6/en/index-btree-hash.html

If it is acceptable you may change your query to 如果可以接受,您可以将查询更改为

LIKE "$query%"

Although this will produce different results it will (at least it should) create a much quicker query. 尽管这将产生不同的结果,但它将(至少应该如此)创建一个更快的查询。

Wildcards are far from ideal! 通配符远非理想!

You cannot use LIKE "%...%" queries in SQL and expect to get good performance from it. 您不能在SQL中使用LIKE "%...%"查询并期望从中获得良好的性能。 A leading wildcard search like that means that the database will have to scan every single record in the table to find matches. 这样的领先通配符搜索意味着数据库将必须扫描表中的每个记录以找到匹配项。 If there are a lot of matches, it will also end up having to use swap space to store the results of the query. 如果有很多匹配项,则最终还必须使用交换空间来存储查询结果。 It will never be quick; 永远不会很快。 probably too slow even on a medium sized DB, and on a large DB like yours, it will be painfully slow. 即使在中等大小的数据库以及像您这样的大型数据库上,它可能也太慢了,它将非常痛苦。

You need a different approach. 您需要一种不同的方法。

There are a number of ways to approach this, and it depends what you're trying to do. 有多种方法可以解决此问题,这取决于您要执行的操作。 If you're looking for keywords in a string, then you might consider pulling all the words out into their own records on a separate table and searching that. 如果要在字符串中查找关键字,则可以考虑将所有单词拉出到单独的表中各自的记录中并进行搜索。 You would end up with effectively a tagging system. 您最终将获得有效的标签系统。

But more often than not, searches like this need more power than that. 但是通常,这样的搜索需要的功能更多。 The best solution then is usually to switch to a dedicated data indexing tool like Sphinx or Lucene . 最好的解决方案通常是切换到SphinxLucene之类的专用数据索引工具。 These two products work slightly differently from each other, but effectively they do the same job: they do a deep run through of your database, and produce a comprehensive index which you can run searches against much much quicker than anything the database can offer. 这两款产品相互运作方式略有不同,但它们有效地做同样的工作:他们做了深刻的运行通过你的数据库,并产生可以抵抗比什么数据库可以提供多少更快的运行搜索的一个综合指标。

They can be complex to setup and configure, but if you want that kind of flexible searching without the performance problems of a LIKE query, they are really the only way you can go. 它们的设置和配置可能很复杂,但是如果您想要那种灵活的搜索而又不存在LIKE查询的性能问题,那么它们实际上是唯一的选择。

If you use LIKE "%..%" , it'll do a full comparison on all the 30 million rows, each time you run the query. 如果使用LIKE "%..%" ,则每次运行查询时,它将对所有3000万行进行完全比较。 Only LIKE "...%" can be cached/indexed. 只有LIKE "...%"可以被缓存/索引。

I don't think you can speed up your query if you want to keep LIKE "%..%" in it, however, I have some suggestions: 如果您想在其中保留LIKE "%..%" ,我认为您无法加快查询速度,但是,我有一些建议:

  • Use WHERE last_name = :query . 使用WHERE last_name = :query Are you sure you want to enter Alex that match both Alex and Alexander? 您确定要输入与Alex和Alexander匹配的Alex吗?
  • Make your own index. 制作自己的索引。 Create a table that contains the most common last names and/or part of them, and their IDs. 创建一个包含最常用的姓氏和/或其中一部分及其ID的表。 Instead of reading 30 million row's value each time, when the user have to wait a minute, create a script that runs in the background for even hours, that builds a table with 30.000 rows, where you can use a simple WHERE field = :query , which can be indexed, and will be much faster. 当用户必须等待一分钟时,不必每次都读取3000万行的值,而是创建一个在后台运行偶数小时的脚本,该脚本将构建一个包含30.000行的表,您可以在其中使用简单的WHERE field = :query ,可以将其编入索引,并且速度更快。 - I guess. - 我猜。
  • Reading dozens of data takes time. 读取数十个数据需要花费时间。 Make sure your table doesn't have 100 columns that you won't need to use, or don't use SELECT * . 确保您的表没有100个不需要使用的列,或者不使用SELECT *

Please don't use ->query("...$query...") . 请不要使用->query("...$query...") PHP's MySQLi API has a function for binding values: bind_param . PHP的MySQLi API具有绑定值的功能: bind_param

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM