简体   繁体   English

从非规范化表中选择最新记录

[英]select most recent record from denormalized table

I have a table set up with columns: 我有一个设置有列的表:

  • fname
  • lname 名字
  • address 地址
  • city
  • state
  • zip 压缩
  • Customer ID 顾客ID
  • date_modified date_modified

the data is basically denormalized so looks like 数据基本上是非规范化的,所以看起来像

Ben -- Smith--***123first*** st -- NY -- NY -- 12101 -- 123 --  1-1-2011
Ben -- Smith--***123 1st st*** -- NY -- NY -- 12101 -- 123 -- 1-1-2011
Sara -- Smith -- BLAH BLAH BLAh

I am trying to copy these records to a new table but I only want one record per Customer ID. 我正在尝试将这些记录复制到新表中,但是每个客户ID仅需要一条记录。

I tried doing something like 我尝试做类似的事情

 insert into new_table(fname,lname,address,city,state,zip,Customer_ID,
                      date_modified)

 select fname,lname,address,city,state,zip,Customer_ID,date_modified
 group by (fname,lname,address,city,state,zip,Customer_ID,date_modified)

the issue is there are too many addresses and other columns that have the same meaning but different text (first vs 1st). 问题是地址和含义相同但文字不同的其他列过多(第一对第一)。 so the group by leaves both of these records in the new table. 因此group by将这两个记录都保留在新表中。 how do i esentialy get one record for each customer id by choosing the max(date modified). 我如何通过选择max(修改日期)来为每个客户ID获得一条记录。 Basically I would want to group by just the customer_id and not the rest of the columns but that isn't allowed in oracle. 基本上,我只想按customer_id而不是其余的列进行分组,但这在oracle中是不允许的。

The following is one of several ways to get what you seem to want. 以下是获得您想要的东西的几种方法之一。 Keep in mind though that this does not normalize your database. 请记住,尽管这不能规范您的数据库。 You still have a customer_id in with a first and last name and an address. 您仍然有一个customer_id ,其中包含名字和姓氏以及地址。 I'd probably turn this into several inserts, one to get all of the unique customer IDs along with the latest name information for the Customers table, then another insert for the addresses. 我可能会把它分成几个插入,一个插入以获取所有唯一的客户ID以及Customers表的最新名称信息,然后另一个插入地址。 If you want historical information of changes then you would need to adjust appropriately. 如果您需要更改的历史信息,则需要进行适当的调整。

Also, the code below will not work properly if you have two rows that have the same exact customer ID and the same exact date_modified . 另外,如果您有两行具有相同的确切客户ID 相同 date_modified ,则下面的代码将无法正常工作。 If you run into that case you'll need to come up with the appropriate business logic to handle that. 如果遇到这种情况,则需要提出适当的业务逻辑来处理。

INSERT INTO New_Table (
    fname,
    lname,
    address,
    city,
    state,
    zip,
    Customer_ID,
    date_modified )
SELECT
    OT1.fname,
    OT1.lname,
    OT1.address,
    OT1.city,
    OT1.state,
    OT1.zip,
    OT1.customer_id,
    OT1.date_modified
FROM
    (
    SELECT
        customer_id,
        MAX(date_modified) AS latest_date_modified
    FROM
        Old_Table
    GROUP BY customer_id) SQ
INNER JOIN Old_Table OT1 ON
    OT1.customer_id = OT1.customer_id AND
    OT1.date_modified = SQ.latest_date_modified

This is quite easy by using analytical (aka windowing) functions to select the first row for each customer. 通过使用分析(aka窗口)功能为每个客户选择第一行,这非常容易。 In case two rows have the same date_modified, it is not defined which one is taken. 如果两行具有相同的date_modified,则未定义采用哪一行。

INSERT INTO new_table (fname,lname,address,city,state,zip,Customer_ID,date_modified)
SELECT fname,
       lname,
       address,
       city,
       state,
       zip,
       Customer_ID,
       date_modified
FROM (
   SELECT fname,
          lname,
          address,
          city,
          state,
          zip,
          Customer_ID,
          date_modified,
          row_number() over (partition by customer_id order by date_modified desc) as rn
) 
WHERE rn = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM