简体   繁体   English

从数据库中有效地读取给定记录ID数组的多个记录

[英]Read multiple records given array of record IDs from the database efficiently

If you have an array of record ID's in your application code, what is the best way to read the records from the database? 如果您的应用程序代码中具有记录ID的数组,那么从数据库中读取记录的最佳方法是什么?

$idNumsIWant = {2,4,5,7,9,23,56};

Obviously looping over each ID is bad because you do n queries: 显然,遍历每个ID,因为你做N次查询不好:

foreach ($idNumsIWant as $memID) {
    $DBinfo = mysql_fetch_assoc(mysql_query("SELECT * FROM members WHERE mem_id = '$memID'"));
    echo "{$DBinfo['fname']}\n";
}

So, perhaps it is better to use a single query? 因此,也许最好使用单个查询?

$sqlResult = mysql_query("SELECT * FROM members WHERE mem_id IN (".join(",",$idNumsIWant).")");
while ($DBinfo = mysql_fetch_assoc($sqlResult))
  echo "{$DBinfo['fname']}\n";

But does this method scale when the array has 30,000 elements? 但是,当数组具有30,000个元素时,此方法可扩展吗?

How do you tackle this problem efficiently? 您如何有效地解决这个问题?

The best approach depends eventually on the number of IDs you have in your array (you obviously don't want to send a 50MB SQL query to your server, even though technically it might be capable of dealing with it without too many trouble), but mostly on how you're going to deal with the resulting rows. 最好的方法最终取决于阵列中ID的数量(您显然不希望向服务器发送50MB的SQL查询,尽管从技术上讲它可能能够轻松处理它),但是主要是关于如何处理结果行。

  • If the number of IDs is very low (let's say a few thousands tops), a single query with a WHERE clause using the IN syntax will be perfect. 如果ID的数量非常少(比如说数千个顶部),那么使用IN语法的带有WHERE子句的单个查询将是完美的。 Your SQL query will be short enough for it to be transfered reliably, efficiently and quickly to the DB server. 您的SQL查询将足够短,可以可靠,高效,快速地传输到数据库服务器。 This method is perfect for a single thread looping through the resulting records. 该方法非常适合单线程循环遍历结果记录。

  • If the number of IDs is really big, I would suggest you split the IDs array in several groups, and run more than 1 query, each one with a group of IDs. 如果ID的数量确实很大,建议您将ID数组分成几组,然后运行多个查询,每个查询都有一组ID。 It may be a little heavier for the DB server, but on the application side you can spawn several threads and deal with the multiple recordsets as soon as they arrive, in a parrallel way. 对于DB服务器而言,这可能会稍微重一些,但是在应用程序端,您可以生成多个线程,并在它们到达后立即以并行方式处理多个记录集。

Both methods will work. 两种方法都可以。

Cliffnotes : For that kind of situations, focus on data usage, as long as data extraction isn't too big of a bottleneck. 注释:对于这种情况,只要数据提取不是很大的瓶颈,就应专注于数据使用。 And profile your app ! 并配置您的应用程序!

My thoughts: 我的想法:

The first method is too costly in terms of processing and disk reads. 就处理和磁盘读取而言,第一种方法的成本太高。

The second method is more efficient and you don't have to worry much about query size limit (but check it anyway). 第二种方法更有效,您不必担心查询大小限制 (但仍然可以检查它)。

When I have to deal with that kind of situation, I see at least three or four possible solutions : 当我不得不处理这种情况时,我看到至少三个或四个可能的解决方案:

  • one request per id ; 每个id一个请求; as you said, this is not really good : lots of requests ; 如您所说,这并不是很好:很多请求; I generally don't do that 我一般不那样做
  • use the solution you proposed : one request for many ids 使用您提出的解决方案:一个请求多个ID
    • but you can't do that with a very long list of ids : some database engines have a limit on the number of data you can pass in an IN() 但是您不能使用很长的id列表来做到这一点:有些数据库引擎对可以传递IN()的数据数量有限制
    • a very big list in IN() might not be good performance-wise IN()一个很大的列表可能不是很好的性能
    • So I generally do something like one request for X ids, and repeat this. 因此,我通常会像对X id的一个请求那样做,然后重复一次。 For instance, to fecth data corresponding to 1000 ids, I could do 20 requests, each getting data for 50 ids (that's just an example : benchmarking your DB/table could be intresting, for your particular case, as it might depends on several factors) 例如,要感染对应于1000个ID的数据,我可以执行20个请求,每个请求获取50个ID的数据(这只是一个示例:对您的数据库/表进行基准测试可能会很有趣,因为您的特定情况可能取决于多个因素)
  • in some cases, you could also re-think your requests : maybe you could avoid passing such a list of ids, by using some kind of join ? 在某些情况下,您还可以重新考虑您的请求:也许可以通过使用某种联接来避免传递此类ID列表? (this really depends on what you need, your tables' schema, ...) (这实际上取决于您的需求,表的模式,...)

Also, to facilitate modifications of the fetching logic, I would write a function that gets the list of ids, and return the list of data corresponding to those. 另外,为便于修改获取逻辑,我将编写一个获取ID列表并返回与ID对应的数据列表的函数。

This way, you just call this function the same way, and you always get the same data, not having to worry about how that data is fetched ; 这样,您只需以相同的方式调用此函数,就可以始终获取相同的数据,而不必担心如何获取该数据; this will allow you to change the fetching method if needed (if you find another better way some day), without breaking anything : HOW the function works will change, but as it's interface (input/output) will remain the same, it will not change a thing for the rest of your code :-) 这将允许您在需要时更改获取方法(如果有一天找到另一种更好的方法),而又不会中断任何事情:函数的工作方式将发生变化,但是由于其接口(输入/输出)将保持不变,因此不会更改代码其余部分的内容:-)

If it were me and I had that large a list of values for the in clause, I would use a stored proc with a variable containing the values I wanted and use a function in it to send them into a temp table and then join to it. 如果是我并且in子句的值列表如此之大,我将使用存储的proc和一个包含所需值的变量,并在其中使用函数将其发送到临时表中,然后加入该表。 Depending on the size of the values you want to send, you might need to split it up into mutiple input vairables to process. 根据要发送的值的大小,您可能需要将其拆分为多个输入变量以进行处理。 Is there any way the values could be permanently stored (if they are often querying on this) in the database? 有什么方法可以将值永久存储(如果它们经常对此进行查询)在数据库中? And how is the user going to pick out 30,000 values, surely he or she is n;t going to tyope them all in? 用户将如何挑选出30,000个值,确定他或她不会将所有值都记录下来? So there is probably a better way to query the table based ona a join and a where clause. 因此,可能有更好的方法基于联接和where子句查询表。

通过将字符串分成令牌来使用StringTokenizer,对于u来说,处理多个值的数据会更容易处理

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM