简体   繁体   English

使用Sphinx / MySQL是否有更好的方法一次从两个表中获取数据?

[英]Is there a better way to get data from two tables at once with Sphinx/MySQL?

Before asking this question it is important to understand what it is I am actually doing. 在问这个问题之前,重要的是要了解我实际上在做什么。

The best comparison to the feature I am implementing would be Facebook's search feature. 与我正在实现的功能的最佳比较将是Facebook的搜索功能。 When you begin typing a drop down list appears with various search results. 当您开始输入内容时,会出现一个包含各种搜索结果的下拉列表。 At the top you will find your friends whose names match your search, then other people who match, then pages, events etc.... 在顶部,您会找到名字与您的搜索相匹配的朋友,然后是与之匹配的其他人,然后是页面,事件等。

My situation is similar however I only want to search for two things. 我的情况类似,但是我只想搜索两件事。 users and documents (named ripples in the code below). 用户和文档(在下面的代码中命名为涟漪图)。

I have this working fine. 我的工作很好。 Please bear with me while I talk through the logic of this feature in my case: 在我讨论此功能的逻辑时,请多多包涵:

  1. User focuses on search input. 用户专注于搜索输入。
  2. Ajax request retrieves the logged in users friends/followers/following and caches them client side (This only occurs the first time a user focusses on the search input) Ajax请求检索已登录用户的朋友/关注者/关注者并将其缓存在客户端(仅在用户首次关注搜索输入时才会发生)
  3. As the user types, a highly optimized function performs a regex against the array of usernames and builds an autocomplete list complete with avatars etc... 当用户输入内容时,高度优化的功能会对用户名数组执行正则表达式,并构建包含头像等的自动完成列表。
  4. At the same time and for every keypress an ajax request is fired to the script below which does the following : 同时,对于每次按键,都会向下面的脚本触发一个ajax请求,该脚本执行以下操作

    • Performs two separate Sphinx searches on two separate indexes. 在两个单独的索引上执行两个单独的Sphinx搜索。 One to collect userid's and the other to collect document id's (rippleid's) 一个收集用户ID,另一个收集文档ID(rippleid)
    • The results of the users query are looped through checking against an array of userid's that were sent in the ajax request to avoid duplicating users that were already displayed during the initial high speed friends/followers check. 通过查询ajax请求中发送的一组用户ID来循环用户查询的结果,以避免重复在最初的高速朋友/关注者检查期间已经显示的用户。
    • Next we query the actual database to get the userdata for the remaining userid's 接下来,我们查询实际数据库以获取剩余用户ID的用户数据
    • The same process is then repeated but this time for the documents (ripples) 然后重复相同的过程,但是这次是文件(波纹)

And finally any returned results are appended to the auto complete list. 最后,所有返回的结果都将附加到自动完成列表中。

This is an example of the PHP function that performs the sphinx lookups and gets the data from the database. 这是PHP函数的一个示例,该函数执行狮身人面像查找并从数据库中获取数据。

public function search()
                {
                                $this->disableLayout();
                                $request = new Request();
                                $params = $request->getParams(GET);

//Perform sphinx textsearch include('/usr/local/lib/php/sphinxapi.php'); $sphinx = new \SphinxClient(); $sphinx->setMatchMode(SPH_MATCH_ANY); $sphinx->SetLimits(0, 4); $mysqlconn = mysql_connect("127.0.0.1:9306") or die ("Couldn't connect to MySQL."); $users = $sphinx->Query($params['data']['q'], "users"); $ripples = $sphinx->Query($params['data']['q'], "ripples"); /* *USERS */ //Loop through users and only collect ID's that are not already present if (!empty($users["matches"])) { $ids = ""; foreach($users['matches'] as $id => $data) { if($ids > ""){ $ids .= ","; } if(!isset($params['data']['e'][$id])){ $ids .= $id; } } //If there any any remaining ID's collect the data from the database and return as JSON if(!empty($ids)){ $userdataquery = "select users.userid, users.firstname, users.lastname from tellycards_user_data users where userid IN($ids) "; $query = new Query($userdataquery); $usersoutput = $query->fetchAll(); } } /* *RIPPLES */ //Loop through ripples and collect ID's if (!empty($ripples["matches"])) { $rippleids = ""; foreach($ripples['matches'] as $id => $data) { if($rippleids > ""){ $rippleids .= ","; } $rippleids .= $id; } //If there any any remaining ID's collect the data from the database and return as JSON if(!empty($rippleids)){ $rippledataquery = "select ripples.id, ripples.name, ripples.screenshot from tellycards_ripples ripples where id IN($rippleids) "; $query = new Query($rippledataquery); $ripplesoutput = $query->fetchAll(); } } header('Content-type: text/json'); echo json_encode(array( 'users' => (!empty($usersoutput)) ? $usersoutput : null, 'ripples' => (!empty($ripplesoutput)) ? $ripplesoutput : null ));

} }

You might ask why we are doing the initial friends lookup and not just using sphinx for everything. 您可能会问为什么我们要进行初始好友查找,而不仅仅是对所有内容都使用sphinx。 Well by implementing the method above. 通过实现上面的方法就可以了。 the user gets instant feedback when they are typing due to having the array of friends stored client side, while despite the fantastic speed of sphinx there inevitably will be some lag due to the http request. 用户输入时由于在客户端存储了朋友的数组而获得了即时反馈,尽管狮身人面像的速度惊人,但由于http请求,不可避免地会有一些滞后。 In practice it works fantastically and incidentally it appears to be the method that facebook uses also. 在实践中,它工作得非常好,顺便说一句,它似乎也是Facebook使用的方法。

Also there is a lot of javascript code preventing unnecessary lookups, the returned data gets added to the cache pile etc so that future searches do not require hitting sphinx/db etc... 还有很多JavaScript代码可防止不必要的查找,将返回的数据添加到缓存堆等中,以便将来的搜索不需要点击sphinx / db等。

Now finally onto my actual question.... 现在终于到了我的实际问题。

This current server side function bothers me a lot. 当前的服务器端功能使我非常困扰。 Right now there are two searches being performed by Sphinx and two searches being performed by MySQL. 目前,Sphinx执行了两次搜索,MySQL执行了两次搜索。 How can I possibly collate all this into one sphinx query and one MySQL query? 如何将所有这些整理为一个狮身人面像查询和一个MySQL查询? Is there any way at all? 有什么办法吗? (Please bare in mind that documents and users may share the same PK ID's as they are on two completely different tables in MySQL and are spread (currently) across two separate indexes). (请记住,文档和用户可能共享相同的PK ID,因为它们位于MySQL的两个完全不同的表中,并且(当前)分布在两个单独的索引中)。 Or is there any way to combine the two MySQL queries to make them more efficient than having two separate selects? 还是有什么办法可以将两个MySQL查询组合在一起,使其比拥有两个单独的选择更有效率?

Or alternatively... Due to the simplicity of the queries am I best keeping them separate as above? 或者作为替代...由于查询的简单性,我是否最好如上所述将它们分开? (both are indexed primary key queries) (均为索引主键查询)

I guess what I am asking for is any recommendations/advice. 我想我要的是任何建议/建议。

Any commentary is very welcome. 任何评论都非常欢迎。

You cant really get away with not having two MySQL queries. 没有两个MySQL查询真的无法摆脱。 Well you could, by either jsut combining them into one, with UNION. 好的,您可以通过JSUT将它们与UNION合并为一个。 Or by creating a new combined 'table' (either a view, or a materialized view) - but really dont think its worth the effort. 或通过创建一个新的组合“表”(一个视图或一个实体化视图)-但实际上并不认为这样做值得。 Two queries is perfectly fine - as you say they indexed. 两个查询都很好-正如您所说的,它们已建立索引。

You could use one sphinx index (and hence one search query) - by creating a new combined index. 您可以使用一个狮身人面像索引(因此可以使用一个搜索查询)-通过创建新的组合索引。 Because you say your keys are not unique, would have to create a new synthetic key. 因为您说密钥不是唯一的,所以必须创建一个新的合成密钥。

eg... 例如...

sql_query = SELECT userid*2 AS id, 1 AS table_id, firstname AS one, lastname as two FROM tellycards_user_data \
              UNION \
            SELECT (id*2)+1 as id, 2 AS table_id, name AS one, screenshot AS two FROM tellycards_ripples
sql_attr_unit = table_id

This gives you a fake key , and an attribute to identify what table the result came from. 这为您提供了一个伪造的key,以及一个用于标识结果来自哪个表的属性。 You can use this to get the original table it came from. 您可以使用它来获取它来自的原始表。 (there are many other ways of doing the same thing) (有许多其他方式可以做相同的事情)

This allows you to run one query, can get combined results. 这使您可以运行一个查询,可以获得合并结果。

... BUT not convinced its a good idea. ...但是没有说服它是一个好主意。 Because if the results are asymmetric, you may miss results. 因为如果结果不对称,您可能会错过结果。 Say there are 20 matching results from one table, and 10 from another. 假设一张表有20个匹配结果,另一张表有10个匹配结果。 Say you show the top 10 results, now becayse of the limit, the results from the second table, could well be hidden below the first table (extream example, in reality, hopefully they intermingled). 假设您显示的是前10个结果,现在是极限,因为第二个表的结果很可能隐藏在第一个表的下方(极端示例,实际上,希望它们混合在一起)。 Two seperate queries, allows you to guarantee, to get SOME results from each table. 两个单独的查询使您可以保证从每个表中获得一些结果。

... so after all that. ...所以毕竟。 Stick with what you got. 坚持你所得到的。 Its fine. 没关系。

You can store and retrieve all data about users and documents in Sphinx so no MySQL is needed. 您可以在Sphinx中存储和检索有关用户和文档的所有数据,因此不需要MySQL。

Use Sphinx QL not API (much better and easier to get stuff done -> http://sphinxsearch.com/docs/current.html#sphinxql-reference ) 使用Sphinx QL而不是API(更好,更轻松地完成工作-> http://sphinxsearch.com/docs/current.html#sphinxql-reference

Notice: dont forget to set all textual fields that you want to retrieve data back from as sql_field_string in sphinx.conf source 注意:不要忘记将所有要从中检索数据的文本字段设置为sphinx.conf源中的sql_field_string

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM