[英]How to update fields based on value of other record
I have a table resembling the following structure: 我有一个类似于以下结构的表:
City start_date end_date
Paris 1995-01-01 00:00:00 1997-10-01 23:59:59
Paris 1997-10-02 00:00:00 0001-01-01 00:00:00
Paris 2013-01-25 00:00:00 0001-01-01 00:00:00
Paris 2015-04-25 00:00:00 0001-01-01 00:00:00
Berlin 2014-11-01 00:00:00 0001-01-01 00:00:00
Berlin 2014-06-01 00:00:00 0001-01-01 00:00:00
Berlin 2015-09-11 00:00:00 0001-01-01 00:00:00
Berlin 2015-10-01 00:00:00 0001-01-01 00:00:00
Milan 2001-01-01 00:00:00 0001-01-01 00:00:00
Milan 2005-10-02 00:00:00 2006-10-02 23:59:59
Milan 2006-10-03 00:00:00 2015-04-24 23:59:59
Milan 2015-04-25 00:00:00 0001-01-01 00:00:00
The data contains a historical view of start and end dates based on cities. 数据包含基于城市的开始和结束日期的历史视图。 The most recent record for a city should be the one which has the highest start date, and an end date of '0001-01-01 00:00:00', indicating that there is no end date yet. 一个城市的最新记录应该是起始日期最高的记录,其终止日期为“ 0001-01-01 00:00:00”,表示尚无终止日期。
I need to clean this data and make sure that historical records for each city all have end dates one second before the next record's start date , only in cases where the end_date is set to '0001-01-01 00:00:00'. 我需要清除此数据,并确保每个城市的历史记录的结束日期都在下一条记录的开始日期之前一秒 ,仅在end_date设置为“ 0001-01-01 00:00:00”的情况下。 So in cases where the end_date has an actual date, that will be ignored. 因此,在end_date具有实际日期的情况下,它将被忽略。 Also, the record with the most recent start_date for a city does not need to have the end_date modified. 同样,城市的最新start_date记录也不需要修改end_date。
The resulting table should look like this: 结果表应如下所示:
City start_date end_date
Paris 1995-01-01 00:00:00 1997-10-01 23:59:59
Paris 1997-10-02 00:00:00 2013-01-24 23:59:59
Paris 2013-01-25 00:00:00 2015-04-24 23:59:59
Paris 2015-04-25 00:00:00 0001-01-01 00:00:00
Berlin 2014-11-01 00:00:00 2014-05-31 23:59:59
Berlin 2014-06-01 00:00:00 2015-09-10 23:59:59
Berlin 2015-09-11 00:00:00 2015-09-30 23:59:59
Berlin 2015-10-01 00:00:00 0001-01-01 23:59:59
Milan 2001-01-01 00:00:00 2005-10-01 23:59:59
Milan 2005-10-02 00:00:00 2006-10-02 23:59:59
Milan 2006-10-03 00:00:00 2015-04-24 23:59:59
Milan 2015-04-25 00:00:00 0001-01-01 00:00:00
I have thought of many ways to achieve this programmatically, however I would love a solution which handles this completely through an SQL query. 我想过许多方法可以以编程方式实现此目的,但是我很想一个解决方案,该解决方案可以通过SQL查询完全处理此问题。 I have found a similar question with an answer here , however it does not handle my particular conditions. 我已经找到一个答案类似的问题在这里 ,但它不处理我的特殊条件。 How can I modify it to satisfy my criteria? 如何修改它以满足我的标准?
EDIT: 编辑:
I have tried the suggested answer below, based on this statement: 基于此陈述,我尝试了以下建议的答案:
update test join
(select t.*,
(select min(start_date)
from test t2
where t2.city = t.city and
t2.start_date > t.start_date
order by t2.start_date
limit 1
) as next_start_date
from test t
) tt
on tt.city = test.city and tt.start_date = test.start_date
set test.end_date = date_sub(tt.next_start_date, interval 1 second)
where test.end_date = '0001-01-01' and
next_start_date is not null;
Unfortunately, some end_dates are not as intended (for example id number 5 and 6), starting from the Berlin records. 不幸的是,从柏林记录开始,某些结束日期不是预期的(例如ID号5和6)。 This is shown below: 如下所示:
Here are the create and insert statements to be able to replicate: 这是可以复制的create和insert语句:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`city` varchar(50) DEFAULT NULL,
`start_date` datetime DEFAULT NULL,
`end_date` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=utf8;
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1995-01-01 00:00:00','1997-10-01 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','1997-10-02 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2013-01-25 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Paris','2015-04-25 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-11-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2014-06-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-09-11 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Berlin','2015-10-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2001-01-01 00:00:00','0001-01-01 00:00:00');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2005-10-02 00:00:00','2006-10-02 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2006-10-03 00:00:00','2015-04-24 23:59:59');
INSERT INTO test (city, start_date, end_date) VALUES ('Milan','2015-04-25 00:00:00','0001-01-01 00:00:00');
You simple need the lead()
function, which is not available in MySQL. 您只需要lead()
函数,该函数在MySQL中不可用。 Using variables in update
is challenging, so here is a method with correlated subqueries. 在update
使用变量具有挑战性,因此这是一种具有相关子查询的方法。
To get the next start date: 要获取下一个开始日期:
select t.*,
(select min(start_date)
from t t2
where t2.city = t.city and
t2.start_date > t.start_date
order by t2.start_date
limit 1
) as next_start_date
from t;
You can now use this in an update
using join
: 现在,您可以使用join
在update
使用它:
update t join
(select t.*,
(select min(start_date)
from t t2
where t2.city = t.city and
t2.start_date > t.start_date
order by t2.start_date
limit 1
) as next_start_date
from t
) tt
on tt.city = t.city and tt.start_date = t.start_date
set t.end_date = date_sub(tt.next_start_date, interval 1 second)
where t.end_date = '0001-01-01' and
t.next_start_date is not null;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.