简体   繁体   English

复制记录以填补日期之间的差距

[英]Duplicating records to fill gap between dates

I need to do something really weird, which is to create fake records in a view to fill the gap between posted dates of product prices. 我需要做一些非常奇怪的事情,即在视图中创建虚假记录以填补产品价格的发布日期之间的差距。

Actually, my scenario is a little bit more complicated than that, but I've simplified to products/dates/prices. 实际上,我的情况比这复杂一点,但我已经简化为产品/日期/价格。

Let's say we have this table: 假设我们有这张表:

create table PRICES_TEST
(
   PRICE_DATE    date          not null,
   PRODUCT       varchar2(13) not null,
   PRICE         number
);

alter table PRICES_TEST 
  add constraint PRICES_TEST_PK
    primary key (PRICE_DATE, PRODUCT);

With these records: 有了这些记录:

insert into PRICES_TEST values (date'2012-04-15', 'Screw Driver', 13);
insert into PRICES_TEST values (date'2012-04-18', 'Screw Driver', 15);

insert into PRICES_TEST values (date'2012-04-13', 'Hammer', 10);
insert into PRICES_TEST values (date'2012-04-16', 'Hammer', 15);
insert into PRICES_TEST values (date'2012-04-19', 'Hammer', 17);

selecting records will return me this: 选择记录将返回给我:

PRICE_DATE                PRODUCT       PRICE                  
------------------------- ------------- ---------------------- 
13-Apr-2012 00:00:00      Hammer        10                     
16-Apr-2012 00:00:00      Hammer        15                     
19-Apr-2012 00:00:00      Hammer        17                     
15-Apr-2012 00:00:00      Screw Driver  13                     
18-Apr-2012 00:00:00      Screw Driver  15                     

Assuming today is Apr 21 2012, I need a view that shall repeat each price every day until a new price is posted. 假设今天是2012年4月21日,我需要一个视图 ,每天都要重复每个价格,直到新价格发布。 Like this: 像这样:

PRICE_DATE                PRODUCT       PRICE                  
------------------------- ------------- ---------------------- 
13-Apr-2012 00:00:00      Hammer        10                     
14-Apr-2012 00:00:00      Hammer        10                     
15-Apr-2012 00:00:00      Hammer        10                     
16-Apr-2012 00:00:00      Hammer        15                     
17-Apr-2012 00:00:00      Hammer        15                     
18-Apr-2012 00:00:00      Hammer        15                     
19-Apr-2012 00:00:00      Hammer        17                     
20-Apr-2012 00:00:00      Hammer        17                     
21-Apr-2012 00:00:00      Hammer        17                     
15-Apr-2012 00:00:00      Screw Driver  13                     
16-Apr-2012 00:00:00      Screw Driver  13                     
17-Apr-2012 00:00:00      Screw Driver  13                     
18-Apr-2012 00:00:00      Screw Driver  15                     
19-Apr-2012 00:00:00      Screw Driver  15                     
20-Apr-2012 00:00:00      Screw Driver  15                     
21-Apr-2012 00:00:00      Screw Driver  15                     

Any ideas how to do that? 任何想法如何做到这一点? I cannot really use other auxiliary tables, triggers nor PL/SQL programming, I really need to do this using a view . 不能真正使用其他辅助表,触发器或PL / SQL编程,我真的需要使用视图来做到这一点。

I think this can be done using oracle analytics, but I'm not familiar with that. 我认为这可以使用oracle分析完成,但我不熟悉。 I tried to read this http://www.club-oracle.com/articles/analytic-functions-i-introduction-164/ but I didn't get it at all. 我试着读这个http://www.club-oracle.com/articles/analytic-functions-i-introduction-164/,但我根本没有得到它。

You can create a row generator statement using the CONNECT BY LEVEL syntax, cross joined with the distinct products in your table, and then outer join that to your prices table. 您可以使用CONNECT BY LEVEL语法创建行生成器语句,与表中的不同产品交叉连接,然后将其连接到price表。 The final touch is to use the LAST_VALUE function and IGNORE NULLS to repeat the price until a new value is encountered, and since you wanted a view, with a CREATE VIEW statement: 最后LAST_VALUE是使用LAST_VALUE函数和IGNORE NULLS LAST_VALUE重复价格,直到遇到新值,并且因为您想要一个视图,使用CREATE VIEW语句:

create view dense_prices_test as
select
    dp.price_date
  , dp.product
  , last_value(pt.price ignore nulls) over (order by dp.product, dp.price_date) price
from (
      -- Cross join with the distinct product set in prices_test
      select d.price_date, p.product
      from (
            -- Row generator to list all dates from first date in prices_test to today
            with dates as (select min(price_date) beg_date, sysdate end_date from prices_test)
            select dates.beg_date + level - 1 price_date 
            from dual
            cross join dates
            connect by level <= dates.end_date - dates.beg_date + 1
            ) d
      cross join (select distinct product from prices_test) p
     ) dp
left outer join prices_test pt on pt.price_date = dp.price_date and pt.product = dp.product;

I think I have a solution using an incremental approach toward the final result with CTE's: 我想我有一个解决方案,使用CTE的最终结果的增量方法:

with mindate as
(
  select min(price_date) as mindate from PRICES_TEST
)
,dates as
(
  select mindate.mindate + row_number() over (order by 1) - 1 as thedate from mindate,
    dual d connect by level <= floor(SYSDATE - mindate.mindate) + 1
)
,productdates as
(
  select p.product, d.thedate
  from (select distinct product from PRICES_TEST) p, dates d
)
,ranges as
(
  select
    pd.product,
    pd.thedate,
    (select max(PRICE_DATE) from PRICES_TEST p2
     where p2.product = pd.product and p2.PRICE_DATE <= pd.thedate) as mindate
    from productdates pd
)
select 
    r.thedate,
    r.product,
    p.price
from ranges r
inner join PRICES_TEST p on r.mindate = p.price_date and r.product = p.product
order by r.product, r.thedate
  • mindate retrieves the earliest possible date in the data set mindate检索数据集中最早的可能日期
  • dates generates a calendar of dates from earliest possible date to today. dates生成从最早可能日期到今天的日期日历。
  • productdates cross joins all possible products with all possible dates productdates cross加入所有可能的产品和所有可能的日期
  • ranges determines which price date applied at each date ranges确定在每个日期应用的价格日期
  • the final query links which price date applied to the actual price and filters out dates for which there are no relevant price dates via the inner join condition 最终查询链接应用于实际价格的价格日期,并通过inner join条件筛选出没有相关价格日期的日期

Demo: http://www.sqlfiddle.com/#!4/e528f/126 演示: http//www.sqlfiddle.com/#!4 / e528f / 126

I made a few changes to Wolf's excellent answer. 我对沃尔夫的出色答案做了一些修改。

I replaced the subquery factoring ( WITH ) with a regular subquery in the connect by . 我用connect by的常规子查询替换了子查询因子( WITH )。 This makes the code a little simpler. 这使代码更简单一些。 (Although this type of code looks weird at first either way, so there may not be a huge gain here.) (虽然这种类型的代码在任何一种方式看起来都很奇怪,所以这里可能没有大的收获。)

Most significantly, I used a partition outer join instead of a cross join and outer join. 最重要的是,我使用了分区外连接而不是交叉连接和外连接。 Partition outer joins are also kind of strange, but they are meant for exactly this type of situation. 分区外连接也有点奇怪,但它们适用于这种情况。 This makes the code simpler, and should improve performance. 这使代码更简单,并且应该提高性能。

select
    price_dates.price_date
    ,product
    ,last_value(price ignore nulls) over (order by product, price_dates.price_date) price
from
(
    select trunc(sysdate) - level + 1 price_date
    from dual
    connect by level <= trunc(sysdate) -
        (select min(trunc(price_date)) from prices_test) + 1
) price_dates
left outer join prices_test
    partition by (prices_test.product)
    on price_dates.price_date = prices_test.price_date;

I just realized that @Wolf and @jonearles improvements do not return the exact results I needed because the row generator to list all dates won't generate the ranges by product. 我刚刚意识到@Wolf和@jonearles的改进并没有返回我需要的确切结果,因为列出所有日期的行生成器不会按产品生成范围。 If the first price of product A is later than any price of product B the first listed date of product A still must be the same. 如果产品A的第一个价格晚于产品B的任何价格,则产品A的第一个列出日期仍然必须相同。 But they really helped me to work further and get the expected results: 但他们真的帮助我进一步努力并获得了预期的结果:

I started with changing @wolf's date range selector from this: 我开始改变@wolf的日期范围选择器:

select min(price_date) beg_date, sysdate end_date from prices_test

to this: 对此:

select min(PRICE_DATE) START_DATE, sysdate as END_DATE, PRODUCT 
from PRICES_TEST group by sysdate, PRODUCT

But, somehow, the number of rows per product is exponentially growing repeatedly for each level. 但是,不知何故,每个产品的行数在每个级别都呈指数级增长。 I just added a distinct in the outter query. 我刚刚在outter查询中添加了一个独特的内容。 The finally select was this: 最后选择是这样的:

select
  DP.PRICE_DATE,
  DP.PRODUCT,
  LAST_VALUE(PT.PRICE ignore nulls) over (order by DP.PRODUCT, DP.PRICE_DATE) PRICE
from (
  select distinct START_DATE + DAYS as PRICE_DATE, PRODUCT 
  from 
  (
    -- Row generator to list all dates from first date of each product to today
    with DATES as (select min(PRICE_DATE) START_DATE, sysdate as END_DATE, PRODUCT from PRICES_TEST group by sysdate, PRODUCT)
    select START_DATE, level - 1 as DAYS, PRODUCT
    from DATES
    connect by level < END_DATE - START_DATE + 1
    order by 3, 2
  ) d order by 2, 1
) DP
left outer join prices_test pt on pt.price_date = dp.price_date and pt.product = dp.product;

@Mellamokb solution is actually what I really need and is certainly better than my noobie solution. @Mellamokb解决方案实际上是我真正需要的,当然比我的noobie解决方案更好。

Thank's everyone not only for helping me with this but also for presenting me features such as "with" and "connect by". 感谢大家不仅帮助我,而且还提供了诸如“with”和“connect by”等功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM