简体   繁体   中英

Fetch range from days

I have this table structure:

EDIT more complex example: add hidden range

category|   day      |   a   |
--------|------------|-------|
1       | 2012-01-01 |   4   |
1       | 2012-01-02 |   4   |
1       | 2012-01-03 |   4   |
1       | 2012-01-04 |   4   |
1       | 2012-01-05 |   5   |
1       | 2012-01-06 |   5   |
1       | 2012-01-07 |   5   |
1       | 2012-01-08 |   4   |
1       | 2012-01-09 |   4   |
1       | 2012-01-10 |   4   |
1       | 2012-01-11 |   5   |
1       | 2012-01-12 |   5   |
1       | 2012-01-16 |   5   |
1       | 2012-01-17 |   5   |
1       | 2012-01-18 |   5   |
1       | 2012-01-19 |   5   |
...

with 'category-day' as unique keys. I would extract a range of dates, for each category, according with column "a" and given limit range, like so:

1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5

or similar.

I search the best way for do it. Using only mysql preferably but also with a little bit of php.

NOTE1: not all day are inserted: between two days non-contiguos could not be other days. In this case I would in output the missed range with column "a" = 0.

NOTE2: I did it with a simple query and some rows of php but I don't like it because my simple algorithm need a cycle for each day in range multiplied for each category found. If range is too big and there are too much categories, that's not so good.

FINAL EDIT: OK! After reading all comments and answers, I think not exists a valid, efficient and, at same time, readable solution. So Mosty Mostacho answer is a no 100% valid solution, but it has 100% valid suggestions. Thank you all.

New edit:

As I told you in a comment, I strongly recommend you to use the quick query and then process the missing dates in PHP as that would be faster and more readable:

select
  concat(@category := category, ',', min(day)) col1,
  concat(max(day), ',', @a := a) col2
from t, (select @category := '', @a := '', @counter := 0) init
where @counter := @counter + (category != @category or a != @a)
group by @counter, category, a

However, if you still want to use the query version, then try this:

select
  @counter := @counter + (category != @category or a != @a) counter,
  concat(@category := category, ',', min(day)) col1,
  concat(max(day), ',', @a := a) col2
from (
  select distinct s.day, s.category, coalesce(t1.a, 0) a
  from (
    select (select min(day) from t) + interval val - 1 day day, c.category
    from seq s, (select distinct category from t) c
    having day <= (select max(day) from t)
  ) s
  left join t t1 on s.day = t1.day and s.category = t1.category
  where s.day between (
    select min(day) from t t2
    where s.category = t2.category) and (
    select max(day) from t t2
    where s.category = t2.category)
  order by s.category, s.day
) t, (select @category := '', @a := '', @counter := 0) init
group by counter, category, a
order by category, min(day)

Note that MySQL won't allow you to create data on the fly, unless you hardcode UNIONS , for example . This is an expensive process that's why I strongly suggest you to create a table with only an integer field with values from 1 to X , where X is, at least the maximum amount of dates that separate the min(day) and max(day) from your table. If you're not sure about that date, just add 100,000 numbers and you'll be able to generate range periods for over 200 years. In the previous query, this table is seq and the column it has is val .

This results in:

+--------------+--------------+
|     COL1     |     COL2     |
+--------------+--------------+
| 1,2012-01-01 | 2012-01-04,4 |
| 1,2012-01-05 | 2012-01-07,5 |
| 1,2012-01-08 | 2012-01-10,4 |
| 1,2012-01-11 | 2012-01-12,5 |
| 1,2012-01-13 | 2012-01-15,0 |
| 1,2012-01-16 | 2012-01-19,5 |
+--------------+--------------+

Ok, I'm lying. The result is actually returning a counter column. Just disregard it, as removing it (using a derived table) would be even less performant!

and here's a one liner brutality for you :) (Note: Change the "datt" table name.)

select dd.category,
dd.day as start_day,
(select dp.day from 
    (
        select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
            select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
        )
        union
        select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
            select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
        )
    ) dp where dp.day >= dd.day - INTERVAL (n-2) DAY order by day asc limit 0,1) 
as end_day,
dd.a from (
    select 1 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
        select * from datt where day = d1.day - INTERVAL 1 DAY and a=d1.a
    )
    union
    select 2 as n,d1.category,d1.day,d1.a from datt d1 where not exists (
        select * from datt where day = d1.day + INTERVAL 1 DAY and a=d1.a
    )
) dd
where n=1

and it's output is :

|| 1 || 2012-01-01 || 2012-01-01 || 4 ||
|| 1 || 2012-01-03 || 2012-01-04 || 4 ||
|| 1 || 2012-01-05 || 2012-01-07 || 5 ||
|| 1 || 2012-01-08 || 2012-01-10 || 4 ||
|| 1 || 2012-01-11 || 2012-01-12 || 5 ||

Note: Thats the result for non-existing 2012-01-02 in a 01-12 day table.

No need for PHP or temporary tables or anything.

DISCLAIMER: I did this just for fun . This stunt may be too crazy to be used in a production environment. Therefore I'm not posting this as a "real" solution. Also I'm not willing to explain how it works :) And I didn't rethink / refactor it. There might be more elegant ways and names / aliases could be more informative. So please no flame or anything.

Here's my solution. Looks more complicated than it is. I think it may be easier to understand than other answers, no offense :)

Setting up test data:

drop table if exists test;
create table test(category int, day date, a int);
insert into test values
(1       , '2012-01-01' ,   4   ),
(1       , '2012-01-02' ,   4   ),
(1       , '2012-01-03' ,   4   ),
(1       , '2012-01-04' ,   4   ),
(1       , '2012-01-05' ,   5   ),
(1       , '2012-01-06' ,   5   ),
(1       , '2012-01-07' ,   5   ),
(1       , '2012-01-08' ,   4   ),
(1       , '2012-01-09' ,   4   ),
(1       , '2012-01-10' ,   4   ),
(1       , '2012-01-11' ,   5   ),
(1       , '2012-01-12' ,   5   ),
(1       , '2012-01-16' ,   5   ),
(1       , '2012-01-17' ,   5   ),
(1       , '2012-01-18' ,   5   ),
(1       , '2012-01-19' ,   5   );

And here it comes:

SELECT category, MIN(`day`) AS firstDayInRange, max(`day`) AS lastDayInRange, a
, COUNT(*) as howMuchDaysInThisRange /*<-- as a little extra*/
FROM
(
SELECT 
IF(@prev != qr.a, @is_a_changing:=@is_a_changing+1, @is_a_changing) AS is_a_changing, @prev:=qr.a, qr.* /*See if column a has changed. If yes, increment, so we can GROUP BY it later*/
FROM
(
SELECT 
test.category, q.`day`, COALESCE(test.a, 0) AS a /*When there is no a, replace NULL with 0*/
FROM
test
RIGHT JOIN
(
SELECT
DATE_SUB(CURDATE(), INTERVAL number_days DAY) AS `day` /*<-- Create dates from now back 999 days. This query is surprisingly fast. And adding more numbers to create more dates, i.e. 10000 dates is also no problem. Therefor a temporary dates table might not be necessary?*/
FROM
(
SELECT (a + 10*b + 100*c) AS number_days FROM
  (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) aa
, (SELECT 0 AS b UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) bb
, (SELECT 0 AS c UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) cc
)sq /*<-- This generates numbers 0 to 999*/
)q USING(`day`) 
, (SELECT @is_a_changing:=0, @prev:=0) r
/*This WHERE clause is just to beautify. It may not be necessary*/
WHERE q.`day` >= (SELECT MIN(test.`day`) FROM test) AND q.`day` <= (SELECT MAX(test.`day`) FROM test) 
)qr
)asdf
GROUP BY is_a_changing
ORDER BY 2

Result looks like this:

category    firstDayInRange     lastDayInRange      a   howMuchDaysInThisRange
--------------------------------------------------------------------------
1           2012-01-01          2012-01-04          4   4
1           2012-01-05          2012-01-07          5   3
1           2012-01-08          2012-01-10          4   3
1           2012-01-11          2012-01-12          5   2
            2012-01-13          2012-01-15          0   3
1           2012-01-16          2012-01-19          5   4

Firstly, this is an extension of @Mosty's solution.

To enable Mosty's solution to include category/date combinations than do not exist in the table I took the following approach -

Start by getting a distinct list of categories and then join this to the entire date range -

SELECT category, `start` + INTERVAL id DAY AS `day`
FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
WHERE id <= DATEDIFF(`end`, `start`)
ORDER BY category, `day`

The above query builds the full date range using the table dummy with a single field id . The id field contains 0,1,2,3,.... - it needs to have enough values to cover every day in the required date range. This can then be joined back to the original table to create a complete list of all categories for all dates and the appropriate value for a -

SELECT cj.category, cj.`day`, IFNULL(t.a, 0) AS a
FROM (
    SELECT category, `start` + INTERVAL id DAY AS `day`
    FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
    WHERE id <= DATEDIFF(`end`, `start`)
    ORDER BY category, `day`
) AS cj
LEFT JOIN t
    ON cj.category = t.category
    AND cj.`day` = t.`day`

This can then be applied to Mosty's query in place of table t -

SELECT
    CONCAT(@category := category, ',', MIN(`day`)) col1,
    CONCAT(MAX(`day`), ',', @a := a) col2
FROM (
    SELECT cj.category, cj.day, IFNULL(t.a, 0) AS a
    FROM (
        SELECT category, `start` + INTERVAL id DAY AS `day`
        FROM dummy,(SELECT DISTINCT category FROM t) cats, (SELECT MIN(day) `start`, MAX(day) `end` FROM t) tmp
        WHERE id <= DATEDIFF(`end`, `start`)
        ORDER BY category, `day`
    ) AS cj
    LEFT JOIN t
        ON cj.category = t.category
        AND cj.`day` = t.day) AS t, (select @category := '', @a := '', @counter := 0) init
WHERE @counter := @counter + (category != @category OR a != @a)
GROUP BY @counter, category, a

Completely on mysql side will have performance adv: Once the procedure has been created, it runs within 0.35 - 0.37 sec

create procedure fetch_range()
begin
declare min date;
declare max date;

create  table testdate(
    date1 date
);

select min(day) into min
from category;

select max(day) into max
from category;

while min <= max do

insert into testdate values(min);
set min = adddate(min,1);
end while;

select concat(category,',',min(day)),concat(max(day),',',a) 
from(
SELECT if(isNull(category),@category,category) category,if(isNull(day),date1,day) day,@a,if(isNull(a) || isNull(@a),if(isNull(a) && isNull(@a),@grp,@grp:=@grp+1),if(@a!=a,@grp:=@grp+1,@grp)) as sor_col,if(isNull(a),0,a) as a,@a:=a,@category:= category
FROM  `category` 
RIGHT JOIN testdate ON date1 = category.day) as table1
group by sor_col;

drop table testdate;

end 

o/p:

1,2012-01-01|2012-01-04,4
1,2012-01-05|2012-01-07,5
1,2012-01-08|2012-01-10,4
1,2012-01-11|2012-01-12,5
1,2012-01-13|2012-01-15,0
1,2012-01-16|2012-01-19,5

Here is mysql solution which will give the desired result excluding the missed range only.

PHP: The missing range can be added through php.

$sql = "set @a=0,@grp=0,@datediff=0,@category=0,@day='';";
mysql_query($sql);

$sql= "select category,min(day)min,max(day) max,a
from(
select category,day,a,concat(if(@a!=a,@grp:=@grp+1,@grp),if(datediff(@day,day) < -1,@datediff:=@datediff+1,@datediff)) as grp_datediff,datediff(@day,day)diff, @day:= day,@a:=a
FROM  category
order by day)as t
group by grp_datediff";

$result = mysql_query($sql);

$diff = 0;
$indx =0;
while($row = mysql_fetch_object($result)){
    if(isset($data[$indx - 1]['max'])){
    $date1 = new DateTime($data[$indx - 1]['max']);
    $date2 =  new DateTime($row->min);
    $diff = $date1->diff($date2);
    }
    if ($diff->days > 1) {

        $date = new DateTime($data[$indx-1]['max']);
        $interval = new DateInterval("P1D");
        $min = $date->add($interval);

        $date = new DateTime($data[$indx-1]['max']);
        $interval = new DateInterval("P".$diff->days."D");
        $max = $date->add($interval);

        $data[$indx]['category'] = $data[$indx-1]['category'];
        $data[$indx]['min'] = $min->format('Y-m-d');
        $data[$indx]['max'] = $max->format('Y-m-d');
        $data[$indx++]['a'] = 0;

         $data[$indx]['category'] = $row->category;
    $data[$indx]['min'] = $row->min;
    $data[$indx]['max'] = $row->max;
    $data[$indx]['a'] = $row->a;
    }else{


    $data[$indx]['category'] = $row->category;
    $data[$indx]['min'] = $row->min;
    $data[$indx]['max'] = $row->max;
    $data[$indx]['a'] = $row->a;
    }

$indx++;
}

To make this work as you want it to, you should have two tables:

  1. for periods
  2. for days

Where each period can have many days related to it through FOREIGN KEY . With current table structure, the best you can do is to detect the continuous periods on PHP side.

Is this what you mean?

SELECT
    category,
    MIN(t1.day),
    MAX(t2.day),
    a
FROM
    `table` AS t1
INNER JOIN `table` AS t2 USING (category, a)

If I understand your question correctly, I would use something to the effect of:

SELECT MAX(day), MIN(day) FROM `YourTable` WHERE `category`= $cat AND `A`= $increment;

... and ...

$dateRange = $cat.","."$min"."|"."$max".",".$increment;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM