简体繁体 English

从大型mySQL数据库按日期排序/获取记录

[英]datewise records sorting /fetching from large mySQL database

原文 2009-06-12 20:31:57 5 2 php/ mysql

I have a separate table for every day's data which is basically webstats type : keywords, visits, duration, IP, sale, etc (maybe 100 bytes total per record) Each table will have around a couple of million records. 对于每天的数据，我有一个单独的表，基本上是webstats类型的：关键字，访问，持续时间，IP，销售等（每条记录总共100字节），每个表大约有几百万条记录。

What I need to do is have a web admin so that the user/admin can view reports for different date periods AND sorted by certain calculated values. 我需要做的是拥有一个Web管理员，以便该用户/管理员可以查看不同日期段的报告并按某些计算值进行排序。 For example, the user may want the results for the 15th of last month to the 12th of this month , sorted by SALE/VISIT , descending order. 例如，用户可能想要上个月15日至本月12日的结果，并按SALE / VISIT降序排列。

The admin/user only needs to view (say) the top 200 records at a time and will probably not view more than a few hundred total in any one session 管理员/用户一次只需要查看（说）前200条记录，并且在任何一个会话中查看的总数可能都不超过几百条

Because of the arbitrary date period involved, I need to sum up the relevant columns for each record and only then can the selection be done. 由于所涉及的日期是任意的，因此我需要总结每条记录的相关列，然后才可以进行选择。

My question is whether it will be possible to have the reports in real time or would they be too slow (the tables are not rarely - if ever - updated after the day's data has been inserted) 我的问题是，是否有可能实时生成报告或它们太慢（在插入当天的数据后，表很少（如果有的话）进行更新）

Is such a scenario better fitted to indexes or tablescans? 这样的方案是否更好地适合于索引或表扫描？

And also, whether a massive table for all dates would be better than having separate tables for each date (there are almost no joins) 而且，对于所有日期而言，庞大的表是否比每个日期都有单独的表（几乎没有联接）更好？

thanks in advance! 提前致谢！

2 个解决方案

With a separate table for each day's data, summarizing across a month is going to involve doing the same analysis on each of 30-odd tables. 对于每天的数据有一个单独的表，一个月的汇总将涉及对30多个表中的每个表进行相同的分析。 Over a year, you will have to do the analysis on 365 or so tables. 一年多来，您将需要对365个左右的表进行分析。 That's going to be a nightmare. 这将是一场噩梦。

It would almost certainly be better to have a soundly indexed single table than the huge number of tables. 拥有完全索引的单个表几乎比拥有大量表要好得多。 Some DBMS support fragmented tables - if MySQL does, fragment the single big table by the date. 一些DBMS支持分段表-如果MySQL支持，则按日期分段单个大表。 I would be inclined to fragment by month, especially if the normal queries are for one month or less and do not cross month boundaries. 我倾向于按月细分，尤其是如果正常查询的时间不超过一个月且不跨越月份界限。 (Even if it involves two months, with decent fragment elimination, the query engine won't have to read most of the data; just the two fragments for the two months. It might be able to do those scans in parallel, even - again, depending on the DBMS.) （即使涉及两个月，而且消除了相当大的碎片，查询引擎也不必读取大部分数据；两个月只需要读取两个碎片。它甚至可以并行进行这些扫描，甚至- ，具体取决于DBMS。）

Sometimes, it is quicker to do sequential scans of a table than to do indexed lookups - don't simply assume that because the query plan involves a table scan that it will automatically be bad performing. 有时，对表进行顺序扫描比进行索引查找要快-不要简单地假设因为查询计划涉及表扫描，所以自动执行效果会很差。

You may want to try a different approach. 您可能想尝试其他方法。 I think Splunk will work for you. 我认为Splunk将为您服务。 It was designed for this, they even do ads on this site. 正是为此而设计的，他们甚至在此网站上做广告。 They have a free version you can try. 他们有一个免费版本，您可以尝试。