How can I count duplicates that fall within a date range? (SQL)

Question

I have a table that contains Applicant ID, Application Date and Job Description.

I am trying to identify duplicates, defined as when the same Applicant ID applies for the same Job Description within 3 days of their other application.

I have already done this for the same date, this way:

CREATE TABLE Duplicates
 SELECT 
  COUNT (ApplicantID) as ApplicantCount
  ApplicantID
  ApplicationDate
  JobDescription
FROM Applications
GROUP BY ApplicantID,ApplicationDate,JobDescription
-
DELETE FROM Duplicates WHERE ApplicantCount <2
SELECT COUNT(*) FROM Duplicates

I'm now trying to make it so it doesn't have to match exactly on the ApplicationDate, but falls within a range. How do you do this?

Answer 1

You can use lead() / lag() . Here is an example that returns the first application when there is a duplicate:

SELECT a.*
FROM (SELECT a.*,
             LEAD(ApplicationDate) OVER (PARTITION BY ApplicantID, JobDescription) as next_ad
      FROM Applications a
     ) a
WHERE next_ad <= ApplicationDate + INTERVAL 3 DAY;

You can also phrase this using exists :

select a.*
from applications a
where exists (select 1
              from applications a2
              where a2.ApplicantID = a.ApplicantID and
                    a2.JobDescription = a.JobDescription and
                    a2.ApplicationDate > a.ApplicationDate and
                    a2.ApplicationDate <= a.ApplicationDate + interval 3 day 
             );

How can I count duplicates that fall within a date range? (SQL)

Question

1 answers

solution1
0 2020-09-09 11:11:44

How can I count duplicates that fall within a date range? (SQL)

Question

1 answers

solution1 0 2020-09-09 11:11:44

solution1
0 2020-09-09 11:11:44