简体   繁体   中英

How can I count duplicates that fall within a date range? (SQL)

I have a table that contains Applicant ID, Application Date and Job Description.

I am trying to identify duplicates, defined as when the same Applicant ID applies for the same Job Description within 3 days of their other application.

I have already done this for the same date, this way:

CREATE TABLE Duplicates
 SELECT 
  COUNT (ApplicantID) as ApplicantCount
  ApplicantID
  ApplicationDate
  JobDescription
FROM Applications
GROUP BY ApplicantID,ApplicationDate,JobDescription
-
DELETE FROM Duplicates WHERE ApplicantCount <2
SELECT COUNT(*) FROM Duplicates

I'm now trying to make it so it doesn't have to match exactly on the ApplicationDate, but falls within a range. How do you do this?

You can use lead() / lag() . Here is an example that returns the first application when there is a duplicate:

SELECT a.*
FROM (SELECT a.*,
             LEAD(ApplicationDate) OVER (PARTITION BY ApplicantID, JobDescription) as next_ad
      FROM Applications a
     ) a
WHERE next_ad <= ApplicationDate + INTERVAL 3 DAY;

You can also phrase this using exists :

select a.*
from applications a
where exists (select 1
              from applications a2
              where a2.ApplicantID = a.ApplicantID and
                    a2.JobDescription = a.JobDescription and
                    a2.ApplicationDate > a.ApplicationDate and
                    a2.ApplicationDate <= a.ApplicationDate + interval 3 day 
             );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM