简体   繁体   中英

Read multiple records given array of record IDs from the database efficiently

If you have an array of record ID's in your application code, what is the best way to read the records from the database?

$idNumsIWant = {2,4,5,7,9,23,56};

Obviously looping over each ID is bad because you do n queries:

foreach ($idNumsIWant as $memID) {
    $DBinfo = mysql_fetch_assoc(mysql_query("SELECT * FROM members WHERE mem_id = '$memID'"));
    echo "{$DBinfo['fname']}\n";
}

So, perhaps it is better to use a single query?

$sqlResult = mysql_query("SELECT * FROM members WHERE mem_id IN (".join(",",$idNumsIWant).")");
while ($DBinfo = mysql_fetch_assoc($sqlResult))
  echo "{$DBinfo['fname']}\n";

But does this method scale when the array has 30,000 elements?

How do you tackle this problem efficiently?

The best approach depends eventually on the number of IDs you have in your array (you obviously don't want to send a 50MB SQL query to your server, even though technically it might be capable of dealing with it without too many trouble), but mostly on how you're going to deal with the resulting rows.

  • If the number of IDs is very low (let's say a few thousands tops), a single query with a WHERE clause using the IN syntax will be perfect. Your SQL query will be short enough for it to be transfered reliably, efficiently and quickly to the DB server. This method is perfect for a single thread looping through the resulting records.

  • If the number of IDs is really big, I would suggest you split the IDs array in several groups, and run more than 1 query, each one with a group of IDs. It may be a little heavier for the DB server, but on the application side you can spawn several threads and deal with the multiple recordsets as soon as they arrive, in a parrallel way.

Both methods will work.

Cliffnotes : For that kind of situations, focus on data usage, as long as data extraction isn't too big of a bottleneck. And profile your app !

My thoughts:

The first method is too costly in terms of processing and disk reads.

The second method is more efficient and you don't have to worry much about query size limit (but check it anyway).

When I have to deal with that kind of situation, I see at least three or four possible solutions :

  • one request per id ; as you said, this is not really good : lots of requests ; I generally don't do that
  • use the solution you proposed : one request for many ids
    • but you can't do that with a very long list of ids : some database engines have a limit on the number of data you can pass in an IN()
    • a very big list in IN() might not be good performance-wise
    • So I generally do something like one request for X ids, and repeat this. For instance, to fecth data corresponding to 1000 ids, I could do 20 requests, each getting data for 50 ids (that's just an example : benchmarking your DB/table could be intresting, for your particular case, as it might depends on several factors)
  • in some cases, you could also re-think your requests : maybe you could avoid passing such a list of ids, by using some kind of join ? (this really depends on what you need, your tables' schema, ...)

Also, to facilitate modifications of the fetching logic, I would write a function that gets the list of ids, and return the list of data corresponding to those.

This way, you just call this function the same way, and you always get the same data, not having to worry about how that data is fetched ; this will allow you to change the fetching method if needed (if you find another better way some day), without breaking anything : HOW the function works will change, but as it's interface (input/output) will remain the same, it will not change a thing for the rest of your code :-)

If it were me and I had that large a list of values for the in clause, I would use a stored proc with a variable containing the values I wanted and use a function in it to send them into a temp table and then join to it. Depending on the size of the values you want to send, you might need to split it up into mutiple input vairables to process. Is there any way the values could be permanently stored (if they are often querying on this) in the database? And how is the user going to pick out 30,000 values, surely he or she is n;t going to tyope them all in? So there is probably a better way to query the table based ona a join and a where clause.

通过将字符串分成令牌来使用StringTokenizer,对于u来说,处理多个值的数据会更容易处理

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM