简体   繁体   中英

SQL Query to exclude rows with similar but not identical values

I need to find only unique car insurance calculations in a table. In this case rows are not unique if calculations were done in less than five minutes one after another in the same day, by one company, on an identical car .

The problem is, all of them are done one by one with different id's and only thing I can get from DataBase is time and date of Calculations, name of the company that have made them,model,brand and production year of the car.

To be more specific, table I have looks like this:

|   Time_Date  | company | year | model | brand  |
|--------------|---------|------|-------|--------|
|20.08.16 15:31|    A    | 2014 | Teana | Nissan |
|20.08.16 15:34|    A    | 2014 | Teana | Nissan |
|20.08.16 15:38|    A    | 2014 | Teana | Nissan |
|20.08.16 16:02|    A    | 2014 | Teana | Nissan |
|20.08.16 15:36|    B    | 2014 | Teana | Nissan |
|20.08.16 15:37|    B    | 2014 | Teana | Nissan |
|21.08.16 15:33|    A    | 2015 | Teana | Nissan |

And what I need to get:

|  Time_Date   | company | year | model | brand  |
|--------------|---------|------|-------|--------|
|20.08.16 15:31|    A    | 2014 | Teana | Nissan |
|20.08.16 16:02|    A    | 2014 | Teana | Nissan |
|20.08.16 15:36|    B    | 2014 | Teana | Nissan |
|21.08.16 15:33|    A    | 2015 | Teana | Nissan |

Database I use is Vertica. Can, please, anyone suggest the solution? It seems like not a big problem, but I kinda stuck :(

PS

If there is a record at 15:31 , then there is a record with same company, year, model at 15:34 it should not be in the final table, and if after that there is another calculation in less than five minutes after the last in a row of calculations similar to 15:31, than it should not be in a final table too. So in this case 15:31,15:34,15:38 are the same and 16:02 is different.

Rextester doesn't have a Vertica environment so I can't test the below.

Here's a working SQL Server version http://rextester.com/FWK58234 (edge cases needed to be tested a bit more)

The syntax seems "close" to SQL Server with the only needed to add ticks around mi in the datediff function (added below)

Use a common table expression (CTE) and Analytic LAG (look back at prior records value) to determine the datediff for each company year model brand partition. Then eliminate all those records with a datetime difference <= 5 but keeping all those with a null datediff (implying it's the first record in the lag series) and those greater than 5 minutes as they denote a unique record.

Note: my example results vary because I added additional data to help edge test.

WITH CTE as (
   SELECT Time_date
        , company
        , year
        , Model
        , Brand
        , datediff('mi',Lag(time_Date,1,NULL) over (partition by company, year, Model, Brand ORDER BY time_date asc),Time_Date) as MinuteDiff
   FROM foo)

   SELECT Time_date, company, year, Model, Brand, MinuteDiff
   FROM CTE
   --We need those with a NULL Minute Difference since they denote the 1st entry for a company, year model brand
   --we also need those with a minute difference > 5
   WHERE MinuteDiff > 5 or minutediff is null
   ORDER BY  Company, Year, Model, Brand, Time_date

*Note if a Time_date record existed for a company, year model and brand such that there was an entry every 5 minutes for the course of 3 days, only 1 record would be returned. A single gap in that would return 2 records (baring the gap being the 1st or last entry)

Try this query

;With cte(  Time_Date  , company , year , model , brand  )
AS
(

SELECT '20.08.16 15:31',    'A'    , 2014 , 'Teana' , 'Nissan' UNION ALL 
SELECT '20.08.16 15:34',    'A'    , 2014 , 'Teana' , 'Nissan' UNION ALL 
SELECT '20.08.16 15:38',    'A'    , 2014 , 'Teana' , 'Nissan' UNION ALL 
SELECT '20.08.16 15:36',    'B'    , 2014 , 'Teana' , 'Nissan' UNION ALL 
SELECT '20.08.16 15:37',    'B'    , 2014 , 'Teana' , 'Nissan' UNION ALL 
SELECT '21.08.16 15:33',    'A'    , 2015 , 'Teana' , 'Nissan' 
)
SELECT Time_Date,   company,    [year], model,  brand FROM
  (
SELECT DISTINCT *, ROW_NUMBER()OVER(PARTITION BY company,model,[year] ORDER by Time_Date,company ) dst FROM cte 
 )Dt
Where dst=1
Order by [year]

Result

Time_Date      company  year    model   brand
------------------------------------------
20.08.16 15:31  A       2014    Teana   Nissan
20.08.16 15:36  B       2014    Teana   Nissan
21.08.16 15:33  A       2015    Teana   Nissan

Is this what you want?

SELECT MIN(Time_Date) AS Time_Date, company, year, model, brand 
FROM Vertica.dbo.yourTable 
GROUP BY company, year, model, brand

This is very easy to implement using the (Vertica) Analytic Function CONDITIONAL_TRUE_EVENT .

First I have created a temp table mutable containing your data:

CREATE LOCAL TEMPORARY TABLE mytable (time_date, company, year, model, brand)
ON COMMIT PRESERVE ROWS AS
    SELECT '2016-08-20 15:31:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL 
    SELECT '2016-08-20 15:34:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL 
    SELECT '2016-08-20 15:38:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL 
    SELECT '2016-08-20 16:02:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL 
    SELECT '2016-08-20 15:36:00'::timestamp(0),'B',2014,'Teana','Nissan' UNION ALL 
    SELECT '2016-08-20 15:37:00'::timestamp(0),'B',2014,'Teana','Nissan' UNION ALL 
    SELECT '2016-08-21 15:33:00'::timestamp(0),'A',2015,'Teana','Nissan' ;

Then you just have to:

SELECT
    MIN(time_date) AS time_date, 
    company, year, model, brand
FROM (
    SELECT
        time_date, company, year, model, brand, 
        CONDITIONAL_TRUE_EVENT(time_date - LAG(time_date) > '5 minutes')
             OVER (ORDER BY time_date) AS cce
    FROM mytable
     ) a 
GROUP BY cce, company, year, model, brand
;
      time_date      | company | year | model | brand  
---------------------+---------+------+-------+--------
 2016-08-20 15:31:00 | A       | 2014 | Teana | Nissan
 2016-08-20 16:02:00 | A       | 2014 | Teana | Nissan
 2016-08-20 15:36:00 | B       | 2014 | Teana | Nissan
 2016-08-21 15:33:00 | A       | 2015 | Teana | Nissan
(4 rows)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM