简体繁体 English

将迭代算法应用于数据库中的一组行

[英]Applying iterative algorithm to a set of rows from database

原文 2010-05-17 18:00:35 4 7 php/ mysql

this question may seem too basic to some, but please bear with be, it's been a while since I dealt with decent database programming. 这个问题对某些人来说似乎太基本了，但请耐心等待，因为我处理了不错的数据库编程已经有一段时间了。

I have an algorithm that I need to program in PHP/MySQL to work on a website. 我有一个算法，我需要在PHP / MySQL中编程才能在网站上工作。 It performs some computations iteratively on an array of objects (it ranks the objects based on their properties). 它在一个对象数组上迭代地执行一些计算（它根据对象的属性对对象进行排名）。 In each iteration the algorithm runs through all collection a couple of times, accessing various data from different places of the whole collection. 在每次迭代中，算法会在所有集合中运行几次，从整个集合的不同位置访问各种数据。 The algorithm needs several hundred iterations to complete. 该算法需要几百次迭代才能完成。 The array comes from a database. 该数组来自数据库。

The straightforward solution that I see is to take the results of a database query and create an object for each row of the query, put the objects to an array and pass the array to my algorithm. 我看到的直接解决方案是获取数据库查询的结果并为查询的每一行创建一个对象，将对象放入一个数组并将数组传递给我的算法。

However, I'm concerned with efficacy of such solution when I have to work with an array of several thousand of items because what I do is essentially mirror the results of a query to memory. 但是，当我必须使用数千个项目的数组时，我关注这种解决方案的功效，因为我所做的实际上是将查询结果镜像到内存中。

On the other hand, making database query a couple of times on each iteration of the algorithm also seems wrong. 另一方面，在算法的每次迭代中多次进行数据库查询似乎也是错误的。

So, my question is - what is the correct architectural solution for a problem like this? 所以，我的问题是 - 对于像这样的问题，正确的架构解决方案是什么？ Is it OK to mirror the query results to memory? 将查询结果镜像到内存是否可以？ If not, which is the best way to work with query results in such an algorithm? 如果没有，这是在这种算法中使用查询结果的最佳方法？

Thanks! 谢谢！

UPDATE : The closest problem that I can think of is ranking of search results by a search engine - I need to do something similar to that. 更新：我能想到的最接近的问题是搜索引擎对搜索结果的排名 - 我需要做类似的事情。 Each result is represented as a row of a database and all results of the set are regarded when the rank is computed. 每个结果表示为数据库的一行，并且在计算排名时考虑该组的所有结果。

7 个解决方案

Don't forget, premature optimization is the root of all evil. 不要忘记，过早优化是万恶之源。 Give it a shot copying everything to memory. 试一试将所有内容复制到内存中。 If that uses too much mem, then optimize for memory. 如果它使用了太多内存，那么优化内存。

It really depends on the situation at hand. 这实际上取决于手头的情况。 It's probably rarely required to do such a thing, but it's very difficult to tell based off of the information you've given. 这可能很少需要做这样的事情，但很难根据你给出的信息来判断。

Try to isolate the data as much as possible. 尝试尽可能地隔离数据。 For instance, if you need to perform some independent action on the data that doesn't have data dependencies amongst iterations of the loop, you can write a query to update the affected rows rather than loading them all into memory, only to write them back. 例如，如果您需要对循环迭代中没有数据依赖性的数据执行一些独立操作，您可以编写一个查询来更新受影响的行，而不是将它们全部加载到内存中，只是将它们写回来。

In short, it is probably avoidable but it's hard to tell until you give us more information :) 简而言之，这可能是可以避免的，但在你给我们更多信息之前很难说清楚:)

Memory seems like the best way to go - iff you can scale up to meet it. 记忆似乎是最好的方式 - 如果你可以扩大规模来满足它。 Otherwise you'll have to revise your algorithm to maybe use a divide and conquer type of approach - do something like a merge sort. 否则，您将不得不修改您的算法，以便使用分而治之的方法 - 执行类似合并排序的操作。

If you are doing a query to the database, when the results come back, they are already "mirrored to memory". 如果您正在对数据库进行查询，那么当结果返回时，它们已经“镜像到内存”。 When you get your results using mysql_fetch_assoc (or equiv) you have your copy. 当您使用mysql_fetch_assoc（或equiv）获得结果时，您将获得副本。 Just use that as the cache. 只需将其用作缓存即可。

Is the computation of one object dependent on another, or are they all independent? 一个对象的计算是依赖于另一个对象，还是它们都是独立的？ If they are independent, you could load just a small number of rows from the database, converting them to objects as you describe. 如果它们是独立的，您可以从数据库中加载少量行，并在描述时将它们转换为对象。 Then run your hundreds of iterations on these, and then output the result for that block. 然后对这些迭代运行数百次迭代，然后输出该块的结果。 You then proceed to the next block of items. 然后，您将继续下一个项目块。

This keeps memory usage down, since you are only dealing with a small number of items rather than the whole data set, and avoids running multiple queries on the database. 这样可以减少内存使用量，因为您只处理少量项目而不是整个数据集，并避免在数据库上运行多个查询。

The SQL keywords LIMIT and OFFSET can help you step through the data block by block. SQL关键字LIMIT和OFFSET可以帮助您逐块逐步执行数据。

Writing ranking queries with MySQL is possible as well, you just need to play with user-defined variables a bit. 使用MySQL编写排名查询也是可能的，您只需要稍微使用用户定义的变量。 If you will provide some input data and the result you are going to achieve, the replies will be more detailed 如果您将提供一些输入数据以及您将要实现的结果，则回复将更加详细

can you use a cron job to do your ranking, say once per day, hour, or whatever you need, and then save the items ranking to a field in its row? 你可以使用一个cron作业来做你的排名，比如每天，每小时或者你需要的任何一个，然后将项目排名保存到其行中的一个字段中吗？

that way when you call your rows up you could just order them by the ranking field. 这样当您调用行时，您可以按排名字段对它们进行排序。