I am having some problem when writing SQL statement to get certain results. Here is my sample data:
For the same ID, if the second start time minus the first end time is less than 45 minutes, it will show the first start_loc and second end_loc. Currently my SQL is:
SELECT start_loc, end_loc FROM Table WHERE end_time - start_time <= 45 GROUP BY ID;
And it returns me two rows of result: first row: 202,208; second row 112,102
The desired outcome should be 65,102 and second row 229,208
Any guides? Thanks in advance.
EDIT
Note, this was even more complicated than I initially thought. I solved it with SQL for the fun of it. If performance is an issue, consider solving it on application level, not on database level.
Here it is. First I created a table that helps to simplify the final query:
create table tmp_foo as
select
sq.*,
@rn := @rn + 1 as row_number,
@gn := if(@prevless != less45, @gn + 1, @gn) as gn,
@prevless := less45
from (
select
t.*,
if(time_to_sec(timediff(start_time, @prevtime)) <= 45 * 60, 1, 0) as less45,
@prevtime := end_time
from
transaction t
, (select @prevtime := (select min(start_time) from transaction)) inner_var_init
order by start_time, end_time
) sq
, (select @gn := 0, @prevless := null, @rn := 0) outer_var_init
order by start_time, end_time;
Note, that this table has no indexes whatsoever. You might want to create some, if performance becomes an issue. And on the original table as well :)
A little explanation:
First we initialize our variables
, (select @prevtime := (select min(start_time) from transaction)) inner_var_init
With the @prevtime variable we access the previous row. That's why the order in the select clause is important. Here
if(time_to_sec(timediff(start_time, @prevtime)) <= 45 * 60, 1, 0) as less45,
@prevtime := end_time
in the first row @prevtime holds the value of the previous row. In the second row the value of the current row is assigned to the @prevtime variable. In the first row we check your condition, if there's more than 45 minutes between the rows. If yes, return a 1, else return 0. We need this, so we can later recognize which rows belong together. Note, that also the order by clause in the subquery is important. Don't "optimize" it away.
Now that we have this, we use the same logic on the outer query.
@rn := @rn + 1 as row_number,
@gn := if(@prevless != less45, @gn + 1, @gn) as gn,
@prevless := less45
In the first row, we simply implement an ever increasing (row)number. We need this, so we later know which of the rows belonging together holds the minimum and which holds the maximum values.
The second row is a "group"number (gn). Every time its value changes, the number is increased. We need this, so we can later join the table to itself and get the minimum and maximum values.
I created a table for all this, because we would have to use it 4 times in the final query. I'm not sure, if the optimizer recognizes, that it only has to execute it only one time. Since variables are used, I doubt that. You can check this by replacing tmp_foo
in the final query with (<the_whole_select_to_create_tmp_foo>)
. Then put EXPLAIN EXTENDED
in front of the very first SELECT
of the final query. Execute it and issue an SHOW WARNINGS;
afterwards. This will show you the real query executed by MySQL.
If you want to read more about user defined variables, here's the manual entry .
Anyway, here is the final query:
select
tmin.start_time,
tmax.end_time,
tmin.start_loc,
tmax.end_loc
from tmp_foo tmin
inner join tmp_foo tmax ON tmin.gn = tmax.gn
where tmin.row_number = (select min(row_number) from tmp_foo t where tmin.gn = t.gn)
and tmax.row_number = (select max(row_number) from tmp_foo t where tmin.gn = t.gn)
;
This is rather self-explaining. Join the table to itself and get the minimum and maximum values. In case you wonder why we are not using group by
and aggregate functions. Here's an excellent entry from the manual: The Rows Holding the Group-wise Maximum of a Certain Column
And finally...
Based on this sample data:
+------+------------+----------+-----------+---------+
| id | start_time | end_time | start_loc | end_loc |
+------+------------+----------+-----------+---------+
| 1 | 09:30:45 | 09:40:45 | 11 | 12 |
| 1 | 09:50:45 | 09:55:45 | 15 | 13 |
| 1 | 10:55:45 | 11:20:45 | 16 | 19 |
| 1 | 11:30:45 | 11:40:45 | 8 | 7 |
+------+------------+----------+-----------+---------+
The result is:
+------------+----------+-----------+---------+
| start_time | end_time | start_loc | end_loc |
+------------+----------+-----------+---------+
| 09:30:45 | 09:55:45 | 11 | 13 |
| 10:55:45 | 11:20:45 | 16 | 19 |
| 11:30:45 | 11:40:45 | 8 | 7 |
+------------+----------+-----------+---------+
See it working live in an sqlfiddle
使用下面的mysql查询来获得所需的结果。
SELECT start_loc, end_loc FROM Table WHERE TIME_TO_SEC(TIMEDIFF(end_time,start_time))/60 <= 45 GROUP BY ID;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.