简体   繁体   中英

SQL query with calculating algorithm

I am having some problem when writing SQL statement to get certain results. Here is my sample data:

在此输入图像描述

For the same ID, if the second start time minus the first end time is less than 45 minutes, it will show the first start_loc and second end_loc. Currently my SQL is:

SELECT start_loc, end_loc FROM Table WHERE end_time - start_time <= 45 GROUP BY ID;

And it returns me two rows of result: first row: 202,208; second row 112,102

The desired outcome should be 65,102 and second row 229,208

Any guides? Thanks in advance.

EDIT

在此输入图像描述

Note, this was even more complicated than I initially thought. I solved it with SQL for the fun of it. If performance is an issue, consider solving it on application level, not on database level.

Here it is. First I created a table that helps to simplify the final query:

create table tmp_foo as
  select 
  sq.*,
  @rn := @rn + 1 as row_number,
  @gn := if(@prevless != less45, @gn + 1, @gn) as gn,
  @prevless := less45
  from (
    select
    t.*,
    if(time_to_sec(timediff(start_time, @prevtime)) <= 45 * 60, 1, 0) as less45,
    @prevtime := end_time
    from
    transaction t
    , (select @prevtime := (select min(start_time) from transaction)) inner_var_init
    order by start_time, end_time
  ) sq
  , (select @gn := 0, @prevless := null, @rn := 0) outer_var_init
  order by start_time, end_time;

Note, that this table has no indexes whatsoever. You might want to create some, if performance becomes an issue. And on the original table as well :)

A little explanation:

First we initialize our variables

    , (select @prevtime := (select min(start_time) from transaction)) inner_var_init

With the @prevtime variable we access the previous row. That's why the order in the select clause is important. Here

    if(time_to_sec(timediff(start_time, @prevtime)) <= 45 * 60, 1, 0) as less45,
    @prevtime := end_time

in the first row @prevtime holds the value of the previous row. In the second row the value of the current row is assigned to the @prevtime variable. In the first row we check your condition, if there's more than 45 minutes between the rows. If yes, return a 1, else return 0. We need this, so we can later recognize which rows belong together. Note, that also the order by clause in the subquery is important. Don't "optimize" it away.

Now that we have this, we use the same logic on the outer query.

  @rn := @rn + 1 as row_number,
  @gn := if(@prevless != less45, @gn + 1, @gn) as gn,
  @prevless := less45

In the first row, we simply implement an ever increasing (row)number. We need this, so we later know which of the rows belonging together holds the minimum and which holds the maximum values.
The second row is a "group"number (gn). Every time its value changes, the number is increased. We need this, so we can later join the table to itself and get the minimum and maximum values.

I created a table for all this, because we would have to use it 4 times in the final query. I'm not sure, if the optimizer recognizes, that it only has to execute it only one time. Since variables are used, I doubt that. You can check this by replacing tmp_foo in the final query with (<the_whole_select_to_create_tmp_foo>) . Then put EXPLAIN EXTENDED in front of the very first SELECT of the final query. Execute it and issue an SHOW WARNINGS; afterwards. This will show you the real query executed by MySQL.

If you want to read more about user defined variables, here's the manual entry .

Anyway, here is the final query:

select 
tmin.start_time,
tmax.end_time,
tmin.start_loc,
tmax.end_loc
from tmp_foo tmin 
inner join tmp_foo tmax ON tmin.gn = tmax.gn
where tmin.row_number = (select min(row_number) from tmp_foo t where tmin.gn = t.gn)
and tmax.row_number = (select max(row_number) from tmp_foo t where tmin.gn = t.gn)
;

This is rather self-explaining. Join the table to itself and get the minimum and maximum values. In case you wonder why we are not using group by and aggregate functions. Here's an excellent entry from the manual: The Rows Holding the Group-wise Maximum of a Certain Column

And finally...

Based on this sample data:

+------+------------+----------+-----------+---------+
| id   | start_time | end_time | start_loc | end_loc |
+------+------------+----------+-----------+---------+
|    1 | 09:30:45   | 09:40:45 |        11 |      12 |
|    1 | 09:50:45   | 09:55:45 |        15 |      13 |
|    1 | 10:55:45   | 11:20:45 |        16 |      19 |
|    1 | 11:30:45   | 11:40:45 |         8 |       7 |
+------+------------+----------+-----------+---------+

The result is:

+------------+----------+-----------+---------+
| start_time | end_time | start_loc | end_loc |
+------------+----------+-----------+---------+
| 09:30:45   | 09:55:45 |        11 |      13 |
| 10:55:45   | 11:20:45 |        16 |      19 |
| 11:30:45   | 11:40:45 |         8 |       7 |
+------------+----------+-----------+---------+

See it working live in an sqlfiddle

使用下面的mysql查询来获得所需的结果。

SELECT start_loc, end_loc FROM Table WHERE TIME_TO_SEC(TIMEDIFF(end_time,start_time))/60 <= 45 GROUP BY ID;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM