简体   繁体   English

我如何在PostgreSQL中的两个表联接中获得最接近的两个日期的差

[英]How i can get the difference of closest two date in two table join in postgresql

I'm having two tables in which table1 我有两个表,其中table1

     activity_timestamp    | activity 
                           |    
 2016-12-23 13:53:47.608561| details viewed
 2017-01-09 14:15:52.570397| details viewed
 2016-12-27 16:06:39.138994| details viewed
 2016-12-24 21:09:56.159436| details viewed

table2 表2

     activity_timestamp    | activity 
                           |    
 2016-12-23 13:54:47.608561| reading
 2017-01-09 14:17:52.570397| reading
 2016-12-27 16:10:39.138994| reading
 2016-12-24 21:012:56.159436| reading

I have to calculate the time between these two activity means reading and detail viewed Result table 我必须计算这两个活动之间的时间,即阅读和查看详细信息结果表

    timediff (minutes)

        1
        2
        4 
        3

These are two tables I have to do the join on these table with condition difference between two activity_timestamp is less than 20 min then only that record will be added in final table for that I have writen this query 这是两个表,我必须在这些表上进行联接,并且两个activity_timestamp之间的条件差小于20分钟,然后只有该记录将被添加到最终表中,因为我已经写了此查询

select DATE_PART('minutes', a1.activity_timestamp- b.activity_timestamp), 
    a1.activity_timestamp, b.activity_timestamp 
from table a1 LEFT JOIN table2 b 
   ON(DATE_PART('minutes', (a1.activity_timestamp - b.activity_timestamp))< 20  
      and (a1.activity_timestamp>b.activity_timestamp)) 
order by b.activity_timestamp;      

But i'm getting the result which is seems to be ambiguous what i can do to get the join which will return me the the record which having only 20 min difference 但是我得到的结果似乎是模棱两可的,我可以做些什么来获得联接,这将使我得到记录,该记录只有20分钟的差异

with
  table1(activity_timestamp, activity) as (
    values
      ('2016-12-23 13:53:47.608561'::timestamp, 'details viewed'),
      ('2017-01-09 14:15:52.570397', 'details viewed'),
      ('2016-12-27 16:06:39.138994', 'details viewed'),
      ('2016-12-24 21:09:56.159436', 'details viewed')),
  table2(activity_timestamp, activity) as (
    values
      ('2016-12-23 13:54:47.608561'::timestamp, 'reading'),
      ('2017-01-09 14:17:52.570397', 'reading'),
      ('2016-12-27 16:10:39.138994', 'reading'),
      ('2016-12-24 21:012:56.159436', 'reading'))
select 
  *,
  activity_timestamp - (select max(activity_timestamp) from table1 as t1 where t2.activity_timestamp > t1.activity_timestamp) as diff
from table2 as t2 order by activity_timestamp, activity;
╔════════════════════════════╤══════════╤══════════╗
║     activity_timestamp     │ activity │   diff   ║
╠════════════════════════════╪══════════╪══════════╣
║ 2016-12-23 13:54:47.608561 │ reading  │ 00:01:00 ║
║ 2016-12-24 21:12:56.159436 │ reading  │ 00:03:00 ║
║ 2016-12-27 16:10:39.138994 │ reading  │ 00:04:00 ║
║ 2017-01-09 14:17:52.570397 │ reading  │ 00:02:00 ║
╚════════════════════════════╧══════════╧══════════╝

But I am not sure about the desired rows order... 但是我不确定所需的行顺序...

I propose to use windowing functions: 我建议使用开窗功能:

with
  table1(activity_timestamp, activity) as (
    values
      ('2016-12-23 13:53:47.608561'::timestamp, 'details viewed'),
      ('2017-01-09 14:15:52.570397', 'details viewed'),
      ('2016-12-27 16:06:39.138994', 'details viewed'),
      ('2016-12-24 21:09:56.159436', 'details viewed')),
  table2(activity_timestamp, activity) as (
    values
      ('2016-12-23 13:54:47.608561'::timestamp, 'reading'),
      ('2017-01-09 14:17:52.570397', 'reading'),
      ('2016-12-27 16:10:39.138994', 'reading'),
      ('2016-12-24 21:012:56.159436', 'reading'))
   , lag AS (
select 
  *, lag(activity_timestamp) OVER (ORDER BY activity_timestamp)
from (
    SELECT * FROM table1
    UNION SELECT * FROM table2
) AS a

) SELECT *, lag - activity_timestamp
FROM lag
WHERE activity = 'reading'
ORDER BY 1
;

The results is : 结果是:

    activity_timestamp     | activity |            lag             | ?column?  
----------------------------+----------+----------------------------+-----------
 2016-12-23 13:54:47.608561 | reading  | 2016-12-23 13:53:47.608561 | -00:01:00
 2016-12-24 21:12:56.159436 | reading  | 2016-12-24 21:09:56.159436 | -00:03:00
 2016-12-27 16:10:39.138994 | reading  | 2016-12-27 16:06:39.138994 | -00:04:00
 2017-01-09 14:17:52.570397 | reading  | 2017-01-09 14:15:52.570397 | -00:02:00
(4 rows)

To compare with other proposed version i create next script: 为了与其他建议的版本进行比较,我创建了下一个脚本:

CREATE TABLE table1 AS
SELECT '2016-01-01'::timestamp + '1 min'::interval * (random() * 10 + 1) AS activity_timestamp,
       'dv'::text AS activity
 FROM generate_series(1, 100000);

CREATE TABLE table2 AS
SELECT activity_timestamp + '1 min'::interval * (random()) AS activity_timestamp,
       'r'::text AS activity
  FROM table1;

CREATE INDEX i1 ON table1 (activity_timestamp DESC);
CREATE INDEX i2 ON table2 (activity_timestamp DESC);

-- Proposed by Abelisto
explain analyze
select 
  *,
  activity_timestamp - (select max(activity_timestamp)
                          from table1 as t1
                         where t2.activity_timestamp > t1.activity_timestamp
  ) as diff
from table2 as t2 order by activity_timestamp, activity;


-- Gordon Linoff - repaired    
explain analyze
select date_part('minutes', a.activity_timestamp - b.activity_timestamp), 
       a.activity_timestamp, b.activity_timestamp
from table1 a left join
     table2 b 
     on a.activity_timestamp < b.activity_timestamp + interval '20 minute' and
        a.activity_timestamp > b.activity_timestamp
order by b.activity_timestamp; 

-- My own version
explain analyze
WITH lag AS (
select 
  *, lag(activity_timestamp) OVER (ORDER BY activity_timestamp)
from (
    SELECT * FROM table1
    UNION SELECT * FROM table2
) AS a

) SELECT *, lag - activity_timestamp
FROM lag
WHERE activity = 'reading'
ORDER BY 1;

For query by Gordon the query time is too long (i do not want to wait). 对于戈登的查询,查询时间太长(我不想等待)。 Abelisto: Abelisto:

                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=53399.41..53649.41 rows=100000 width=56) (actual time=944.918..957.470 rows=100000 loops=1)
   Sort Key: t2.activity_timestamp, t2.activity
   Sort Method: external merge  Disk: 4104kB
   ->  Seq Scan on table2 t2  (cost=0.00..41675.09 rows=100000 width=56) (actual time=0.068..874.282 rows=100000 loops=1)
         SubPlan 2
           ->  Result  (cost=0.39..0.40 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=100000)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.29..0.39 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=100000)
                         ->  Index Only Scan using i1 on table1 t1  (cost=0.29..3195.63 rows=33167 width=8) (actual time=0.008..0.008 rows=1 loops=100000)
                               Index Cond: ((activity_timestamp IS NOT NULL) AND (activity_timestamp < t2.activity_timestamp))
                               Heap Fetches: 100000
 Planning time: 0.392 ms
 Execution time: 961.594 ms
(13 rows)

My own: 我自己的:

                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=39214.47..39216.97 rows=1000 width=64) (actual time=325.461..325.461 rows=0 loops=1)
   Sort Key: lag.activity_timestamp
   Sort Method: quicksort  Memory: 25kB
   CTE lag
     ->  WindowAgg  (cost=28162.14..34662.14 rows=200000 width=48) (actual time=131.906..265.747 rows=199982 loops=1)
           ->  Unique  (cost=28162.14..29662.14 rows=200000 width=40) (actual time=131.900..200.937 rows=199982 loops=1)
                 ->  Sort  (cost=28162.14..28662.14 rows=200000 width=40) (actual time=131.899..167.072 rows=200000 loops=1)
                       Sort Key: table1.activity_timestamp, table1.activity
                       Sort Method: external merge  Disk: 4000kB
                       ->  Append  (cost=0.00..5082.00 rows=200000 width=40) (actual time=0.007..27.569 rows=200000 loops=1)
                             ->  Seq Scan on table1  (cost=0.00..1541.00 rows=100000 width=40) (actual time=0.007..8.584 rows=100000 loops=1)
                             ->  Seq Scan on table2  (cost=0.00..1541.00 rows=100000 width=40) (actual time=0.007..7.248 rows=100000 loops=1)
   ->  CTE Scan on lag  (cost=0.00..4502.50 rows=1000 width=64) (actual time=325.458..325.458 rows=0 loops=1)
         Filter: (activity = 'reading'::text)
         Rows Removed by Filter: 199982
 Planning time: 0.103 ms
 Execution time: 327.737 ms
(17 rows)

For compare I also run all queries for 1000 rows: Abelisto: 为了进行比较,我还对1000行运行所有查询:Abelisto:

                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=469.71..472.21 rows=1000 width=56) (actual time=8.817..8.882 rows=1000 loops=1)
   Sort Key: t2.activity_timestamp, t2.activity
   Sort Method: quicksort  Memory: 103kB
   ->  Seq Scan on table2 t2  (cost=0.00..419.89 rows=1000 width=56) (actual time=0.058..8.441 rows=1000 loops=1)
         SubPlan 2
           ->  Result  (cost=0.39..0.40 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1000)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.28..0.39 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1000)
                         ->  Index Only Scan using i1 on table1 t1  (cost=0.28..38.91 rows=332 width=8) (actual time=0.007..0.007 rows=1 loops=1000)
                               Index Cond: ((activity_timestamp IS NOT NULL) AND (activity_timestamp < t2.activity_timestamp))
                               Heap Fetches: 1000
 Planning time: 0.311 ms
 Execution time: 8.948 ms
(13 rows)

Gordon: 戈登:

                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=21087.07..21364.85 rows=111111 width=24) (actual time=439.142..528.240 rows=452961 loops=1)
   Sort Key: b.activity_timestamp
   Sort Method: external merge  Disk: 15016kB
   ->  Nested Loop Left Join  (cost=0.28..9493.05 rows=111111 width=24) (actual time=0.056..280.036 rows=452961 loops=1)
         ->  Seq Scan on table1 a  (cost=0.00..16.00 rows=1000 width=8) (actual time=0.007..0.114 rows=1000 loops=1)
         ->  Index Only Scan using i2 on table2 b  (cost=0.28..7.81 rows=111 width=8) (actual time=0.006..0.171 rows=453 loops=1000)
               Index Cond: (activity_timestamp < a.activity_timestamp)
               Filter: (a.activity_timestamp < (activity_timestamp + '00:20:00'::interval))
               Heap Fetches: 452952
 Planning time: 0.102 ms
 Execution time: 545.139 ms
(11 rows)

My own: 我自己的:

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=291.85..291.87 rows=10 width=64) (actual time=2.942..2.942 rows=0 loops=1)
   Sort Key: lag.activity_timestamp
   Sort Method: quicksort  Memory: 25kB
   CTE lag
     ->  WindowAgg  (cost=211.66..246.66 rows=2000 width=48) (actual time=1.504..2.374 rows=2000 loops=1)
           ->  Sort  (cost=211.66..216.66 rows=2000 width=40) (actual time=1.500..1.676 rows=2000 loops=1)
                 Sort Key: table1.activity_timestamp
                 Sort Method: quicksort  Memory: 142kB
                 ->  HashAggregate  (cost=62.00..82.00 rows=2000 width=40) (actual time=0.669..0.931 rows=2000 loops=1)
                       Group Key: table1.activity_timestamp, table1.activity
                       ->  Append  (cost=0.00..52.00 rows=2000 width=40) (actual time=0.007..0.255 rows=2000 loops=1)
                             ->  Seq Scan on table1  (cost=0.00..16.00 rows=1000 width=40) (actual time=0.007..0.073 rows=1000 loops=1)
                             ->  Seq Scan on table2  (cost=0.00..16.00 rows=1000 width=40) (actual time=0.005..0.074 rows=1000 loops=1)
   ->  CTE Scan on lag  (cost=0.00..45.02 rows=10 width=64) (actual time=2.939..2.939 rows=0 loops=1)
         Filter: (activity = 'reading'::text)
         Rows Removed by Filter: 2000
 Planning time: 0.092 ms
 Execution time: 3.001 ms
(18 rows)

Just use direct date comparisons in the on clause, rather than the minutes of the difference: 只需在on子句中使用直接日期比较,而不要使用差异的分钟数即可:

select date_part('minutes', a1.activity_timestamp - b.activity_timestamp), 
       a1.activity_timestamp, b.activity_timestamp
from table a1 left join
     table2 b 
     on a.active_timestamp < b.activity_timestamp + interval '20 minute' and
        a.activity_timetamp > b.activity_timestamp
order by b.activity_timestamp; 

I should note: If (in the case of multiple matches) you want to limit this to only one record from a or b , then you can use distinct on . 我应该注意:如果(如果有多个匹配项)您希望将其限制为仅来自ab一条记录,则可以distinct on使用distinct on I'm not sure which table you want want only one record for, however. 但是,我不确定您要哪个表只记录一个。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在另一个表中加入最接近的前一个日期的记录? - How can I join a record with the closest prior date in another table? 如何通过将一个表中的日期与另一个表中最接近和之前的日期匹配来连接两个表 - How to join two tables by matching the date in one table with the date closest to and before in another table 如何连接两个表,但从表1中获得结果,而表2中没有条目? - How can I join two tables but get results from table one where theres no entry in table two? 如何在 SQL 中加入两个具有 id 和最近的下一个日期的表? - How do I join two tables with ids and the closest next date in SQL? 如何连接两个表以更新 Postgresql 中的一个 - How to join two table to update one in Postgresql 如何在 PostgreSQL 中加入两个“未命名的表/选择”? - How can I join two “unnamed tables/selections” in PostgreSQL? 如何在 PostgreSQL 中加入两个查询 - how can join two query in PostgreSQL 如何获得 PostgreSQL 中的两个平均值之间的差异,其中平均值在一列上,而最终表格按两列分组? - How to get difference between two average values in PostgreSQL, where the averages are on a column, and the final table grouped by two columns? 如何在mysql中联接两个表,但两者之间有日期差异? - How to join two tables in mysql, but with a date difference between the two? 如何获得SSRS中两个日期变量之间的差异 - How do I get the difference between two date variables in SSRS
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM