简体   繁体   English

使用MySQL中的100万条记录创建报表并显示在Java JSP页面中

[英]Creating report from 1 million + records in MySQL and display in Java JSP page

I am working on a MySQL database with 3 tables - workout_data, excercises and sets tables. 我正在使用带有3个表的MySQL数据库-Exercise_data,练习和设置表。 I'm facing issues related to generating reports based on these three tables. 我面临与基于这三个表生成报告有关的问题。

To add more information, a number of sets make up an excercise and a number of excercises will be a workout. 为了增加更多的信息,许多练习组成了练习,许多练习将成为练习。 I currently have the metrics to which a report is to be generated from the data in these tables. 我目前拥有根据这些表中的数据生成报告所依据的指标。 I've to generate reports for the past 42 days including this week. 我必须生成过去42天(包括本周)的报告。 The queries run for a long time by the time I get the report by joining these tables. 通过连接这些表获得报表时,查询运行了很长时间。

For example - the sets table has more than 1 million records just for the past 42 days. 例如,sets表在过去42天内有超过100万条记录。 The id in this table is the excercise_id in excercise table. 该表中的id是excercise表中的excercise_id。 The id of excercise table is the workout_id in workout_data table. 锻炼表的ID是execution_data表中的execution_id。

I'm running this query and it takes more than 10 minutes to get the data. 我正在运行此查询,并且花费了超过10分钟的时间才能获取数据。 I have to prepare a report and show it to the user in the browser. 我必须准备一份报告,并在浏览器中显示给用户。 But due to this long running query the webpage times out and the user is not able to see the report. 但是由于运行时间长,该网页超时,用户无法看到该报告。

Any advice on how to achieve this? 关于如何实现这一目标的任何建议?

        SELECT REPORTSETS.USER_ID,REPORTSETS.WORKOUT_LOG_ID,
               REPORTSETS.SET_DATE,REPORTSETS.EXCERCISE_ID,REPORTSETS.SET_NUMBER 
          FROM EXCERCISES 
    INNER JOIN REPORTSETS ON EXCERCISES.ID=REPORTSETS.EXCERCISE_ID 
         where user_id=(select id from users where email='testuser1@gmail.com') 
           and substr(set_date,1,10)='2013-10-29' 
      GROUP BY REPORTSETS.USER_ID,REPORTSETS.WORKOUT_LOG_ID,
               REPORTSETS.SET_DATE,REPORTSETS.EXCERCISE_ID,REPORTSETS.SET_NUMBER

Comments on your SQL that you might want to look into: 您可能想研究的有关SQL的注释:

1) Do you have an index on USER_ID and SET_DATE? 1)您在USER_ID和SET_DATE上有索引吗?

2) Your datatype for SET_DATE looks wrong, is it a varchar? 2)您的SET_DATE数据类型看起来不对,是varchar吗? Storing it as a date will mean that the db can optimise your search much more efficiently. 将其存储为日期将意味着数据库可以更有效地优化搜索。 At the moment the substring method will be called countless times per query as it has to be run for every row returned by the first part of your where clause. 目前,每个查询都会调用substring方法无数次,因为必须为where子句的第一部分返回的每一行运行substring方法。

3) Is the group by really required? 3)分组人数是真的需要吗? Unless I'm missing something the 'group by' part of the statement brings nothing to the table ;) 除非我缺少某些内容,否则语句的“分组依据”部分不会为您带来任何麻烦;)

Two things: 两件事情:

First, You have the following WHERE clause item to pull out a single day's data. 首先,您具有以下WHERE子句项,以提取单日数据。

  AND substr(set_date,1,10)='2013-10-29'

This definitively defeats the use of an index on the date. 这绝对不利于使用日期索引。 If your set_date column has a DATETIME datatype, what you want is 如果您的set_date列具有DATETIME数据类型,则需要的是

  AND set_date >= `2013-10-09`
  AND set date <  `2013-10-09` + INTERVAL 1 DAY

This will allow the use of a range scan on an index on set_date. 这将允许对set_date上的索引使用范围扫描。 It looks to me like you might want a compound index on (user_id, set_date) . 在我看来,您可能需要(user_id, set_date)上的复合索引。 But you should muck around with EXPLAIN to figure out whether that's right. 但是,您应该与EXPLAIN混在一起,以确定是否正确。

Second, you're misusing GROUP BY . 其次,您滥用了GROUP BY That clause is pointless unless you have some kind of summary function like SUM() or GROUP_CONCAT() in your query. 该子句毫无意义,除非您的查询中具有某种汇总功能,例如SUM()GROUP_CONCAT() Do you want ORDER BY ? 您要ORDER BY吗?

It should make a significant difference if you could store the date either as a date, or in the format you need to make the comparison. 如果您可以将日期存储为日期或进行比较的格式,则应该有很大的不同。 Performing a substr() call on every date must be time consuming. 在每个日期执行一次substr()调用必须很耗时。

Surely the suggestions with tuning the query would help to improve the query speed. 当然,有关调整查询的建议将有助于提高查询速度。 But I think the main point here is what can be done with more than 1 million plus records before session timed out . 但是我认为这里的要点是在会话超时之前用超过100万条记录可以做什么。 What if you have like 2 or 3 million records, will some performance tuning solve the problem? 如果您有2或3百万条记录,该如何进行性能调整来解决问题呢? I don't think so. 我不这么认为。 So: 所以:

1) If you want to display on browser, use pagination and query (for example) the first 100 record. 1)如果要在浏览器上显示,请使用分页并查询(例如)前100条记录。
2) If you want to generate a report (like pdf), then use asynchronous method (JMS) 2)如果要生成报告(如pdf),请使用异步方法(JMS)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM