简体   繁体   中英

Applying iterative algorithm to a set of rows from database

this question may seem too basic to some, but please bear with be, it's been a while since I dealt with decent database programming.

I have an algorithm that I need to program in PHP/MySQL to work on a website. It performs some computations iteratively on an array of objects (it ranks the objects based on their properties). In each iteration the algorithm runs through all collection a couple of times, accessing various data from different places of the whole collection. The algorithm needs several hundred iterations to complete. The array comes from a database.

The straightforward solution that I see is to take the results of a database query and create an object for each row of the query, put the objects to an array and pass the array to my algorithm.

However, I'm concerned with efficacy of such solution when I have to work with an array of several thousand of items because what I do is essentially mirror the results of a query to memory.

On the other hand, making database query a couple of times on each iteration of the algorithm also seems wrong.

So, my question is - what is the correct architectural solution for a problem like this? Is it OK to mirror the query results to memory? If not, which is the best way to work with query results in such an algorithm?

Thanks!

UPDATE : The closest problem that I can think of is ranking of search results by a search engine - I need to do something similar to that. Each result is represented as a row of a database and all results of the set are regarded when the rank is computed.

Don't forget, premature optimization is the root of all evil. Give it a shot copying everything to memory. If that uses too much mem, then optimize for memory.

It really depends on the situation at hand. It's probably rarely required to do such a thing, but it's very difficult to tell based off of the information you've given.

Try to isolate the data as much as possible. For instance, if you need to perform some independent action on the data that doesn't have data dependencies amongst iterations of the loop, you can write a query to update the affected rows rather than loading them all into memory, only to write them back.

In short, it is probably avoidable but it's hard to tell until you give us more information :)

Memory seems like the best way to go - iff you can scale up to meet it. Otherwise you'll have to revise your algorithm to maybe use a divide and conquer type of approach - do something like a merge sort.

If you are doing a query to the database, when the results come back, they are already "mirrored to memory". When you get your results using mysql_fetch_assoc (or equiv) you have your copy. Just use that as the cache.

Is the computation of one object dependent on another, or are they all independent? If they are independent, you could load just a small number of rows from the database, converting them to objects as you describe. Then run your hundreds of iterations on these, and then output the result for that block. You then proceed to the next block of items.

This keeps memory usage down, since you are only dealing with a small number of items rather than the whole data set, and avoids running multiple queries on the database.

The SQL keywords LIMIT and OFFSET can help you step through the data block by block.

Writing ranking queries with MySQL is possible as well, you just need to play with user-defined variables a bit. If you will provide some input data and the result you are going to achieve, the replies will be more detailed

can you use a cron job to do your ranking, say once per day, hour, or whatever you need, and then save the items ranking to a field in its row?

that way when you call your rows up you could just order them by the ranking field.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM