简体   繁体   English

如何优化此搜索结构?

[英]How do i optimize this search structure?

I am developing a web app that lets users search for items in a database. 我正在开发一个Web应用程序,该程序可让用户搜索数据库中的项目。 The items are categorized into several categories. 这些项目分为几类。 Each search item returned is displayed differently depending on its field values. 返回的每个搜索项根据其字段值显示不同。 Currently i built a class to handle display and another to handle search. 目前,我建立了一个用于处理显示的类,另一个用于处理搜索的类。 The search class builds the SQL query using several user input, queries the database for the ID of items that match the user input and sends the IDs in an array to the display class. 搜索类使用多个用户输入来构建SQL查询,在数据库中查询与用户输入匹配的项的ID,然后将数组中的ID发送给显示类。

an excerpt of the code that does this: 执行此操作的代码摘录:

//the sql query is actually a little more complex than this
$query = "SELECT items.id FROM items, subcategories WHERE {$name} AND items.`base_school` = '{$_SESSION['base_school']}' AND items.subcategory = subcategories.id AND subcategories.parent_category = {$search_category} ORDER BY `time_added` DESC {$limit}";
$result = $DB_CONNECTION->query($query);
$newly_added = array();
while  (list($id) = $result->fetch_row()) $result[] = $id;
searchDisplay::print_result($result);

The display class queries the database for the full details of each item one after the other and displays it as it should be displayed. 显示类一个接一个地查询数据库以获取每个项目的全部详细信息,并按应显示的方式显示它。

My question is would it be better/faster to query the database for the full details of the item (about 23 fields from 3 different tables) the first time the database is queried and store the data in an array and then pass that array to the display class rather than have the display class query the database for each item using the items unique ID? 我的问题是,第一次查询数据库并将数据存储在数组中,然后将该数组传递给数据库时,会更好/更快地查询数据库以获取项目的全部详细信息(来自3个不同表的约23个字段)。展示类,而不是让展示类使用商品的唯一ID向数据库查询每个商品? My current solution runs fine for now but i need to know if they would be any problem with my approach when the database starts to grow (about 500,000 rows). 我当前的解决方案目前运行良好,但是我需要知道当数据库开始增长(大约500,000行)时,它们是否会对我的方法造成任何问题。

Secondly, data from the database is filtered using several search criteria supplied by the user. 其次,使用用户提供的几种搜索条件过滤数据库中的数据。 Would it be better to build a complex query with a few joins that would accommodate all the user's criteria or to write a simple query that accommodates the major filters and then use PHP to filter the few remaining results that don't match the search criteria? 用几个可以容纳所有用户条件的联接构建一个复杂的查询,还是写一个可以容纳主要过滤器的简单查询,然后使用PHP来过滤一些不符合搜索条件的剩余结果,会更好吗?

In my opinion filtering of results need to be done at each step OTHERWISE the query will become slow as the data grows bigger and bigger. 我认为需要在每个步骤中进行结果过滤,否则,随着数据的增长,查询将变得缓慢。 Hence the strategy mentioned in last paragraph is the optimal one. 因此,上一段中提到的策略是最佳策略。

You should always try to avoid putting a query in a loop. 您应该始终避免将查询置于循环中。 A single query, even if it is complex, is usually faster and scales better. 单个查询,即使它很复杂,通常也更快并且扩展更好。 Like all "rules", there are exceptions. 像所有“规则”一样,也有例外。 If the loop is much faster than the complex query, then you should stick with the loop since you know it won't grow to do doing tens or hundreds of iterations (right?). 如果循环比复杂的查询快得多,那么您应该坚持使用循环,因为您知道循环不会增加进行数十或数百次迭代(对吗?)。

As far as filtering in the DB or PHP, typically it's better to do it in the DB and avoid transferring useless data over the network. 至于在数据库或PHP中进行过滤,通常最好在数据库中进行过滤,并避免通过网络传输无用的数据。 Using the HAVING option in MySQL is usually equivalent to what you would do in PHP to filter things. 在MySQL中使用HAVING选项通常等同于在PHP中进行过滤的操作。

You need to keep latency in mind. 您需要牢记延迟。 Often with networked apps that run slowly, latency is the culprit. 通常,对于运行缓慢的网络应用程序,延迟是元凶。

Even if each individual query is tiny and can be executed quickly, they all have latency. 即使每个单独的查询很小并且可以快速执行,它们也会有延迟。 You say worst case, 100 queries. 您说最坏的情况是100个查询。

Even if there is only 10 milliseconds of latency for each query (keep in mind each query requires overhead from network drivers, the actual round trip time on the wire, etc) you needlessly add 100*10ms = 1 second (incredibly long by computer terms). 即使每个查询只有10毫秒的延迟(请记住,每个查询都需要网络驱动程序的开销,网络上的实际往返时间等),您也不必添加100 * 10ms = 1秒(根据计算机术语来说,这会非常长) )。

Most likely it would take much less than one second to simply execute 1 query that gets all the info in one shot. 只需执行1个查询即可一次性获得所有信息,而最有可能花费不到一秒钟的时间。 Then you only incur the latency penalty once. 然后,您只需招致一次延迟惩罚。

So I suggest rewriting your approach to use one query, and pass around arrays, as you suggest. 因此,我建议重写您的方法以使用一个查询,并按照您的建议传递数组。

Things like this slip through testing all the time because people test in an environment where the latency between client and server is very very low (for instance on same server without much activity). 由于人们在客户端和服务器之间的等待时间非常低的环境中进行测试(例如,在同一台服务器上没有太多活动),因此这种测试一直贯穿于测试中。 Then app goes real world, and the client and server are both busy and hundreds of miles apart... 然后应用程序进入现实世界,客户端和服务器都很忙,相距数百英里……

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM