简体   繁体   English

SQL查询与计算算法

[英]SQL query with calculating algorithm

I am having some problem when writing SQL statement to get certain results. 编写SQL语句以获得某些结果时遇到一些问题。 Here is my sample data: 这是我的示例数据:

在此输入图像描述

For the same ID, if the second start time minus the first end time is less than 45 minutes, it will show the first start_loc and second end_loc. 对于相同的ID,如果第二个开始时间减去第一个结束时间小于45分钟,它将显示第一个start_loc和第二个end_loc。 Currently my SQL is: 目前我的SQL是:

SELECT start_loc, end_loc FROM Table WHERE end_time - start_time <= 45 GROUP BY ID;

And it returns me two rows of result: first row: 202,208; 它返回两行结果: 第一行:202,208; second row 112,102 第二排112,102

The desired outcome should be 65,102 and second row 229,208 期望的结果应该是65,102和第二排229,208

Any guides? 任何指南? Thanks in advance. 提前致谢。

EDIT 编辑

在此输入图像描述

Note, this was even more complicated than I initially thought. 请注意,这比我最初想的要复杂得多。 I solved it with SQL for the fun of it. 为了它的乐趣,我用SQL解决了它。 If performance is an issue, consider solving it on application level, not on database level. 如果性能是一个问题,请考虑在应用程序级别而不是在数据库级别上解决它。

Here it is. 这里是。 First I created a table that helps to simplify the final query: 首先,我创建了一个有助于简化最终查询的表:

create table tmp_foo as
  select 
  sq.*,
  @rn := @rn + 1 as row_number,
  @gn := if(@prevless != less45, @gn + 1, @gn) as gn,
  @prevless := less45
  from (
    select
    t.*,
    if(time_to_sec(timediff(start_time, @prevtime)) <= 45 * 60, 1, 0) as less45,
    @prevtime := end_time
    from
    transaction t
    , (select @prevtime := (select min(start_time) from transaction)) inner_var_init
    order by start_time, end_time
  ) sq
  , (select @gn := 0, @prevless := null, @rn := 0) outer_var_init
  order by start_time, end_time;

Note, that this table has no indexes whatsoever. 请注意,此表没有任何索引。 You might want to create some, if performance becomes an issue. 如果性能成为问题,您可能想要创建一些。 And on the original table as well :) 并在原来的桌子上:)

A little explanation: 一点解释:

First we initialize our variables 首先我们初始化变量

    , (select @prevtime := (select min(start_time) from transaction)) inner_var_init

With the @prevtime variable we access the previous row. 使用@prevtime变量,我们访问上一行。 That's why the order in the select clause is important. 这就是为什么select子句中的顺序很重要的原因。 Here 这里

    if(time_to_sec(timediff(start_time, @prevtime)) <= 45 * 60, 1, 0) as less45,
    @prevtime := end_time

in the first row @prevtime holds the value of the previous row. 在@prevtime的第一行中保存前一行的值。 In the second row the value of the current row is assigned to the @prevtime variable. 在第二行中,将当前行的值分配给@prevtime变量。 In the first row we check your condition, if there's more than 45 minutes between the rows. 在第一行,我们检查您的情况,如果行之间的时间超过45分钟。 If yes, return a 1, else return 0. We need this, so we can later recognize which rows belong together. 如果是,返回1,否则返回0.我们需要这个,所以我们以后可以识别哪些行属于一起。 Note, that also the order by clause in the subquery is important. 请注意,子查询中的order by子句也很重要。 Don't "optimize" it away. 不要“优化”它。

Now that we have this, we use the same logic on the outer query. 现在我们有了这个,我们在外部查询上使用相同的逻辑。

  @rn := @rn + 1 as row_number,
  @gn := if(@prevless != less45, @gn + 1, @gn) as gn,
  @prevless := less45

In the first row, we simply implement an ever increasing (row)number. 在第一行中,我们只是实现了一个不断增加的(行)数字。 We need this, so we later know which of the rows belonging together holds the minimum and which holds the maximum values. 我们需要这个,所以我们后来知道属于哪一行的最小值和最大值。
The second row is a "group"number (gn). 第二行是“组”编号(gn)。 Every time its value changes, the number is increased. 每次其值发生变化时,数量都会增加。 We need this, so we can later join the table to itself and get the minimum and maximum values. 我们需要这个,所以我们以后可以将表连接到自身并获得最小值和最大值。

I created a table for all this, because we would have to use it 4 times in the final query. 我为这一切创建了一个表,因为我们必须在最终查询中使用它4次。 I'm not sure, if the optimizer recognizes, that it only has to execute it only one time. 我不确定,如果优化器识别出,它只需要执行一次。 Since variables are used, I doubt that. 由于使用了变量,我对此表示怀疑。 You can check this by replacing tmp_foo in the final query with (<the_whole_select_to_create_tmp_foo>) . 您可以通过用(<the_whole_select_to_create_tmp_foo>)替换最终查询中的tmp_foo来检查这一点。 Then put EXPLAIN EXTENDED in front of the very first SELECT of the final query. 然后将EXPLAIN EXTENDED放在最后查询的第一个SELECT前面。 Execute it and issue an SHOW WARNINGS; 执行它并发出SHOW WARNINGS; afterwards. 然后。 This will show you the real query executed by MySQL. 这将显示MySQL执行的真实查询。

If you want to read more about user defined variables, here's the manual entry . 如果您想了解有关用户定义变量的更多信息,请参阅手册条目

Anyway, here is the final query: 无论如何,这是最后的查询:

select 
tmin.start_time,
tmax.end_time,
tmin.start_loc,
tmax.end_loc
from tmp_foo tmin 
inner join tmp_foo tmax ON tmin.gn = tmax.gn
where tmin.row_number = (select min(row_number) from tmp_foo t where tmin.gn = t.gn)
and tmax.row_number = (select max(row_number) from tmp_foo t where tmin.gn = t.gn)
;

This is rather self-explaining. 这是相当自我解释的。 Join the table to itself and get the minimum and maximum values. 将表连接到自身并获取最小值和最大值。 In case you wonder why we are not using group by and aggregate functions. 如果你想知道我们为什么不使用group by和aggregate函数。 Here's an excellent entry from the manual: The Rows Holding the Group-wise Maximum of a Certain Column 这是手册中的一个优秀条目: 行保持某一列的最大组

And finally... 最后......

Based on this sample data: 基于此示例数据:

+------+------------+----------+-----------+---------+
| id   | start_time | end_time | start_loc | end_loc |
+------+------------+----------+-----------+---------+
|    1 | 09:30:45   | 09:40:45 |        11 |      12 |
|    1 | 09:50:45   | 09:55:45 |        15 |      13 |
|    1 | 10:55:45   | 11:20:45 |        16 |      19 |
|    1 | 11:30:45   | 11:40:45 |         8 |       7 |
+------+------------+----------+-----------+---------+

The result is: 结果是:

+------------+----------+-----------+---------+
| start_time | end_time | start_loc | end_loc |
+------------+----------+-----------+---------+
| 09:30:45   | 09:55:45 |        11 |      13 |
| 10:55:45   | 11:20:45 |        16 |      19 |
| 11:30:45   | 11:40:45 |         8 |       7 |
+------------+----------+-----------+---------+

See it working live in an sqlfiddle 看到它在sqlfiddle中工作

使用下面的mysql查询来获得所需的结果。

SELECT start_loc, end_loc FROM Table WHERE TIME_TO_SEC(TIMEDIFF(end_time,start_time))/60 <= 45 GROUP BY ID;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM