[英]How do I return a max date results against criteria in a min date row?
I have a list of data like so: 我有这样的数据列表:
ID step date 1 SECOND_ATTEMPT 03/19/2018 1 QC_READY 03/23/2018 1 QC_REJECTS 03/26/2018 2 SCHEDULED 02/01/2018 2 FINISHED 02/04/2018 3 SECOND_ATTEMPT 04/02/2018 3 QC_READY 04/03/2018 4 SECOND_ATTEMPT 01/15/2018 4 FINISHED 01/25/2018
My query, stripped down, looks like this. 我的查询被剥离,看起来像这样。
select
j.id,
wfh.step,
wfh.date
from Job j
join work_flow_history wfh on j.id = wfh.job_id
the wfh IDs are unique for each step in each job. wfh ID对于每个作业的每个步骤都是唯一的。 So if viewing the step ID's it would look like this: 因此,如果查看步骤ID,它将如下所示:
wfh_ID step
001 SECOND_ATTEMPT
002 QC_READY
etc...
I would like to return all of the information in the row with the most recent date in it per ID when the step column contains SECOND_ATTEMPT
and doesn't contain FINISHED
. 当步骤列包含SECOND_ATTEMPT
且不包含FINISHED
时,我想返回每个ID中具有最新日期的行中的所有信息。 The correct result set would look like so: 正确的结果集如下所示:
ID step date 1 QC_REJECTS 03/26/2018 3 QC_READY 04/03/2018
This is a little weird as a requirement. 这有点不可思议。 Here is one method: 这是一种方法:
select t.*
from t
where not exists (select 1 from t t2 where t2.id = t.id and t2.step = 'FINISHED') and
exists (select 1 from t t2 where t2.id = t.id and t2.step = 'SECOND_ATTEMPT') and
t.date = (select max(t2.date) from t t2 where t2.id = t.id);
Another way with just one subquery: 仅有一个子查询的另一种方式:
select t.*
from t join
(select t.id, max(t.date) as maxd
from t
group by t.id
having sum(t.step = 'SECOND_ATTEMPT') > 0 and
sum(t.step = 'FINISHED') = 0
) tt
on tt.id = t.id and tt.maxd = t.date;
SELECT *
FROM
(
SELECT
j.id,
wfh.step,
wfh.date
FROM Job j
JOIN work_flow_history wfh on j.id = wfh.job_id
WHERE wfh.step NOT LIKE '%FINISHED%'
AND wfh.step LIKE '%SECOND_ATTEMPT%'
ORDER BY DATE_FORMAT(wfh.date, "%Y-%m-%d") DESC
) t
GROUP BY t.id
GROUP BY takes first row when grouping, subquery orders it beforehand. 分组时GROUP BY占据第一行,子查询将其预先排序。 Since there won't be any SUM() or else impacted, no need for more complexity 由于不会有任何SUM()或受影响,因此不需要更多的复杂性
Having your date in better format in database would avoid using date_format() though. 不过,以更好的格式在数据库中保存日期可以避免使用date_format()。
Thanks to the answers posted in previous comments I got some ideas and figured out how to make it work. 感谢先前评论中发布的答案,我有了一些想法,并弄清楚了如何使其工作。
select
j.id
,sum(case when wfh.step = 'SECOND_ATTEMPT' then 1 else 0 end) sum_
,(select max(wfh1.date)
from work_flow_history wfh1
join job j1 on j1.id = wfh1.job_id
where j1.id = j.id
order by wfh1.date desc
limit 1) date_
,(select wfh1.step
from work_flow_history wfh1
join job j1 on j1.id = wfh1.job_id
where j1.id = j.ie
and wfh1.step <> 'FINISHED'
order by wfh1.date desc
limit 1) step_
from job j
join wofk_flow_history wfh on j.id = wfh.job_id
group by j.id
This query successfully looks at all wfh steps per job id and returns the max wfh step if SECOND_ATTEMPT is in any of the rows associated with that job id. 此查询成功查看每个作业ID的所有wfh步骤,如果SECOND_ATTEMPT位于与该作业ID相关的任何行中,则返回最大wfh步骤。
I know it's really messy. 我知道这真的很乱。 I'm not sure how to change the subquery columns into proper joins to make it look nicer, and I also need to add logic in order to only include instances where sum_ = 1. But I am very glad to have moved past this roadblock. 我不确定如何将子查询列更改为适当的联接以使其看起来更好,并且我还需要添加逻辑以仅包括sum_ = 1的实例。但是我很高兴能跨过这个障碍。 Thanks everyone for your help! 谢谢大家的帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.