I need to find only unique car insurance calculations in a table. In this case rows are not unique if calculations were done in less than five minutes one after another in the same day, by one company, on an identical car .
The problem is, all of them are done one by one with different id's and only thing I can get from DataBase is time and date of Calculations, name of the company that have made them,model,brand and production year of the car.
To be more specific, table I have looks like this:
| Time_Date | company | year | model | brand |
|--------------|---------|------|-------|--------|
|20.08.16 15:31| A | 2014 | Teana | Nissan |
|20.08.16 15:34| A | 2014 | Teana | Nissan |
|20.08.16 15:38| A | 2014 | Teana | Nissan |
|20.08.16 16:02| A | 2014 | Teana | Nissan |
|20.08.16 15:36| B | 2014 | Teana | Nissan |
|20.08.16 15:37| B | 2014 | Teana | Nissan |
|21.08.16 15:33| A | 2015 | Teana | Nissan |
And what I need to get:
| Time_Date | company | year | model | brand |
|--------------|---------|------|-------|--------|
|20.08.16 15:31| A | 2014 | Teana | Nissan |
|20.08.16 16:02| A | 2014 | Teana | Nissan |
|20.08.16 15:36| B | 2014 | Teana | Nissan |
|21.08.16 15:33| A | 2015 | Teana | Nissan |
Database I use is Vertica. Can, please, anyone suggest the solution? It seems like not a big problem, but I kinda stuck :(
PS
If there is a record at 15:31 , then there is a record with same company, year, model at 15:34 it should not be in the final table, and if after that there is another calculation in less than five minutes after the last in a row of calculations similar to 15:31, than it should not be in a final table too. So in this case 15:31,15:34,15:38 are the same and 16:02 is different.
Rextester doesn't have a Vertica environment so I can't test the below.
Here's a working SQL Server version http://rextester.com/FWK58234 (edge cases needed to be tested a bit more)
The syntax seems "close" to SQL Server with the only needed to add ticks around mi in the datediff function (added below)
Use a common table expression (CTE) and Analytic LAG (look back at prior records value) to determine the datediff for each company year model brand partition. Then eliminate all those records with a datetime difference <= 5 but keeping all those with a null datediff (implying it's the first record in the lag series) and those greater than 5 minutes as they denote a unique record.
Note: my example results vary because I added additional data to help edge test.
WITH CTE as (
SELECT Time_date
, company
, year
, Model
, Brand
, datediff('mi',Lag(time_Date,1,NULL) over (partition by company, year, Model, Brand ORDER BY time_date asc),Time_Date) as MinuteDiff
FROM foo)
SELECT Time_date, company, year, Model, Brand, MinuteDiff
FROM CTE
--We need those with a NULL Minute Difference since they denote the 1st entry for a company, year model brand
--we also need those with a minute difference > 5
WHERE MinuteDiff > 5 or minutediff is null
ORDER BY Company, Year, Model, Brand, Time_date
*Note if a Time_date record existed for a company, year model and brand such that there was an entry every 5 minutes for the course of 3 days, only 1 record would be returned. A single gap in that would return 2 records (baring the gap being the 1st or last entry)
Try this query
;With cte( Time_Date , company , year , model , brand )
AS
(
SELECT '20.08.16 15:31', 'A' , 2014 , 'Teana' , 'Nissan' UNION ALL
SELECT '20.08.16 15:34', 'A' , 2014 , 'Teana' , 'Nissan' UNION ALL
SELECT '20.08.16 15:38', 'A' , 2014 , 'Teana' , 'Nissan' UNION ALL
SELECT '20.08.16 15:36', 'B' , 2014 , 'Teana' , 'Nissan' UNION ALL
SELECT '20.08.16 15:37', 'B' , 2014 , 'Teana' , 'Nissan' UNION ALL
SELECT '21.08.16 15:33', 'A' , 2015 , 'Teana' , 'Nissan'
)
SELECT Time_Date, company, [year], model, brand FROM
(
SELECT DISTINCT *, ROW_NUMBER()OVER(PARTITION BY company,model,[year] ORDER by Time_Date,company ) dst FROM cte
)Dt
Where dst=1
Order by [year]
Result
Time_Date company year model brand
------------------------------------------
20.08.16 15:31 A 2014 Teana Nissan
20.08.16 15:36 B 2014 Teana Nissan
21.08.16 15:33 A 2015 Teana Nissan
Is this what you want?
SELECT MIN(Time_Date) AS Time_Date, company, year, model, brand
FROM Vertica.dbo.yourTable
GROUP BY company, year, model, brand
This is very easy to implement using the (Vertica) Analytic Function CONDITIONAL_TRUE_EVENT .
First I have created a temp table mutable
containing your data:
CREATE LOCAL TEMPORARY TABLE mytable (time_date, company, year, model, brand)
ON COMMIT PRESERVE ROWS AS
SELECT '2016-08-20 15:31:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL
SELECT '2016-08-20 15:34:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL
SELECT '2016-08-20 15:38:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL
SELECT '2016-08-20 16:02:00'::timestamp(0),'A',2014,'Teana','Nissan' UNION ALL
SELECT '2016-08-20 15:36:00'::timestamp(0),'B',2014,'Teana','Nissan' UNION ALL
SELECT '2016-08-20 15:37:00'::timestamp(0),'B',2014,'Teana','Nissan' UNION ALL
SELECT '2016-08-21 15:33:00'::timestamp(0),'A',2015,'Teana','Nissan' ;
Then you just have to:
SELECT
MIN(time_date) AS time_date,
company, year, model, brand
FROM (
SELECT
time_date, company, year, model, brand,
CONDITIONAL_TRUE_EVENT(time_date - LAG(time_date) > '5 minutes')
OVER (ORDER BY time_date) AS cce
FROM mytable
) a
GROUP BY cce, company, year, model, brand
;
time_date | company | year | model | brand
---------------------+---------+------+-------+--------
2016-08-20 15:31:00 | A | 2014 | Teana | Nissan
2016-08-20 16:02:00 | A | 2014 | Teana | Nissan
2016-08-20 15:36:00 | B | 2014 | Teana | Nissan
2016-08-21 15:33:00 | A | 2015 | Teana | Nissan
(4 rows)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.