简体   繁体   中英

How to update fields based on value of other record

I have a table resembling the following structure:

City        start_date             end_date
Paris       1995-01-01 00:00:00    1997-10-01 23:59:59
Paris       1997-10-02 00:00:00    0001-01-01 00:00:00
Paris       2013-01-25 00:00:00    0001-01-01 00:00:00
Paris       2015-04-25 00:00:00    0001-01-01 00:00:00
Berlin      2014-11-01 00:00:00    0001-01-01 00:00:00
Berlin      2014-06-01 00:00:00    0001-01-01 00:00:00
Berlin      2015-09-11 00:00:00    0001-01-01 00:00:00
Berlin      2015-10-01 00:00:00    0001-01-01 00:00:00
Milan       2001-01-01 00:00:00    0001-01-01 00:00:00
Milan       2005-10-02 00:00:00    2006-10-02 23:59:59
Milan       2006-10-03 00:00:00    2015-04-24 23:59:59
Milan       2015-04-25 00:00:00    0001-01-01 00:00:00

The data contains a historical view of start and end dates based on cities. The most recent record for a city should be the one which has the highest start date, and an end date of '0001-01-01 00:00:00', indicating that there is no end date yet.

I need to clean this data and make sure that historical records for each city all have end dates one second before the next record's start date , only in cases where the end_date is set to '0001-01-01 00:00:00'. So in cases where the end_date has an actual date, that will be ignored. Also, the record with the most recent start_date for a city does not need to have the end_date modified.

The resulting table should look like this:

City        start_date             end_date
Paris       1995-01-01 00:00:00    1997-10-01 23:59:59
Paris       1997-10-02 00:00:00    2013-01-24 23:59:59
Paris       2013-01-25 00:00:00    2015-04-24 23:59:59
Paris       2015-04-25 00:00:00    0001-01-01 00:00:00
Berlin      2014-11-01 00:00:00    2014-05-31 23:59:59
Berlin      2014-06-01 00:00:00    2015-09-10 23:59:59
Berlin      2015-09-11 00:00:00    2015-09-30 23:59:59
Berlin      2015-10-01 00:00:00    0001-01-01 23:59:59
Milan       2001-01-01 00:00:00    2005-10-01 23:59:59
Milan       2005-10-02 00:00:00    2006-10-02 23:59:59
Milan       2006-10-03 00:00:00    2015-04-24 23:59:59
Milan       2015-04-25 00:00:00    0001-01-01 00:00:00

I have thought of many ways to achieve this programmatically, however I would love a solution which handles this completely through an SQL query. I have found a similar question with an answer here , however it does not handle my particular conditions. How can I modify it to satisfy my criteria?

EDIT:

I have tried the suggested answer below, based on this statement:

update test join
       (select t.*,
               (select min(start_date)
                from test t2
                where t2.city = t.city and
                      t2.start_date > t.start_date
                order by t2.start_date
                limit 1
               ) as next_start_date
        from test t
       ) tt
       on tt.city = test.city and tt.start_date = test.start_date
    set test.end_date = date_sub(tt.next_start_date, interval 1 second)
where test.end_date = '0001-01-01' and
      next_start_date is not null;

Unfortunately, some end_dates are not as intended (for example id number 5 and 6), starting from the Berlin records. This is shown below:

在此处输入图片说明

Here are the create and insert statements to be able to replicate:

CREATE TABLE `test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `city` varchar(50) DEFAULT NULL,
  `start_date` datetime DEFAULT NULL,
  `end_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8;

INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1995-01-01 00:00:00','1997-10-01 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1997-10-02 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2013-01-25 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2015-04-25 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-11-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-06-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-09-11 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-10-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2001-01-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2005-10-02 00:00:00','2006-10-02 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2006-10-03 00:00:00','2015-04-24 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2015-04-25 00:00:00','0001-01-01 00:00:00');

You simple need the lead() function, which is not available in MySQL. Using variables in update is challenging, so here is a method with correlated subqueries.

To get the next start date:

select t.*,
       (select min(start_date)
        from t t2
        where t2.city = t.city and
              t2.start_date > t.start_date
        order by t2.start_date
        limit 1
       ) as next_start_date
from t;

You can now use this in an update using join :

update t join
       (select t.*,
               (select min(start_date)
                from t t2
                where t2.city = t.city and
                      t2.start_date > t.start_date
                order by t2.start_date
                limit 1
               ) as next_start_date
        from t
       ) tt
       on tt.city = t.city and tt.start_date = t.start_date
    set t.end_date = date_sub(tt.next_start_date, interval 1 second)
where t.end_date = '0001-01-01' and
      t.next_start_date is not null;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM