简体   繁体   English

如何有效地结合两个搜索结果?

[英]How to combine two search results effectively?

I'm programming a site in PHP/MySQL that gets search results for products via API from an external site. 我正在使用PHP / MySQL编程一个站点,该站点通过外部站点的API获取产品的搜索结果。 This site also will have it's own products and the owners of the site want the search results to be inter-connected. 该网站也将拥有自己的产品,网站的所有者希望搜索结果相互连接。

If someone searches for VIDEO, ordered by date then the results should be all in order regardless of the source it came from. 如果有人搜索按日期排序的视频,那么结果应该全部按顺序排列,无论它来自哪个来源。

eg. 例如。

July 31 - Video A - our database
July 30 - Video B - via API
July 29 - Video C - via API
July 28 - Video D - our database
...

The problem I'm having is figuring out a way to do this effectively especially regarding viewing multiple pages of results. 我遇到的问题是找到一种有效地做到这一点的方法,特别是在查看多页结果时。 If someone clicks to the 2nd page of results then I need to figure out the last item on the first page of results (and the last item from the API), then only get the items from the API starting after the last API item viewed on the previous page and then do the same for our database results and re-combine them again. 如果有人点击结果的第二页,那么我需要弄清楚结果的第一页上的最后一项(以及API中的最后一项),然后只能在查看最后一个API项目后的API中获取项目上一页然后对我们的数据库结果执行相同操作并再次重新组合它们。

In order to avoid this complex algorithm, another idea I had was to limit the results to a large amount - like 500 results and grab them all at once and order them. 为了避免这种复杂的算法,我的另一个想法是将结果限制在很大的数量 - 比如500个结果并立即抓住它们并对它们进行排序。 Then if the user goes forward a few pages, I do not have to re-grab all the data. 然后,如果用户前进几页,我不必重新获取所有数据。

Does anyone have suggestions on good algorithms to use to combine two search results? 有没有人对用于组合两个搜索结果的好算法有什么建议?

Whether you use it for caching or not, you will need to grab at least a page worth of results from both sources, in case all the next results will come from that source. 无论您是否将其用于缓存,您都需要从两个来源获取至少一页的结果,以防所有下一个结果都来自该来源。

Grabbing a lot of results and caching them (in the session) is one solution you could use. 抓取大量结果并缓存它们(在会话中)是您可以使用的一种解决方案。

If for some reason you don't want to cache all the results (if the operation is expensive and you need this optimized), you could store a simple array in the session that contains the location of the results, and then you would know the starting number for the next page. 如果由于某种原因你不想缓存所有结果(如果操作很昂贵且需要优化),你可以在会话中存储一个包含结果位置的简单数组,然后你就会知道下一页的起始编号。

For example (pseudo code) 例如(伪代码)

**Request 1**
Get 10 results from API
Get 10 results form Database
Merge the results
Display first 10 and save the order to an array
   (A for API, D for Database, ex: A,A,A,D,A,D,D,A,D,A)

User clicks page 2

**Request 2** (Page 2)
Get 10 results from API starting at 5
Get 10 results from Database starting at 7
Repeat merge and display above.

You could also optionally cache what you have needed to retrieve so far (and you will have 10 extra results). 您也可以选择缓存到目前为止需要检索的内容(并且您将获得10个额外的结果)。 This would make the first request longer, but could possibly make the second request much faster. 这会使第一个请求更长,但可能会使第二个请求更快。

If the user jumps forward several pages, you would need to get the largest number of results that could have been displayed in the preceeding unknown pages from each source. 如果用户向前跳转多个页面,则需要获得可能在每个源的前面未知页面中显示的最大数量的结果。

If you are not too worried about performance from either source, I would retrieve up to a large number like you said and cache all results temporarily. 如果你不是太担心来自任何一个源的性能,我会像你说的那样检索大量数据并暂时缓存所有结果。 As soon as a new search is executed, dump the old results. 执行新搜索后,转储旧结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM