简体   繁体   中英

SQL query to show only non-duplicated row and duplicated row only from recent date

So I have two tables (let's say x and y). Most of the data from both table are duplicated but there are some rows that are different. I insert whole data from those two tables into a new table (let's say table_mixed). There's one column that indicate table's date eg 20190307 for x and 20190308 for y So, for any duplicated rows, there will be a date column that is different.

num        Code     col1 col2 col3.....  import_date    file_date   
-------- ---------  -----------------   ----------   ----------
01         AA       ......                20190308          20190307      
01         AA       ......                20190308          20190308      
02         AA       ......                20190308          20190307      
03         BB       ......                20190308          20190308      

What I am trying to do is, I want to query a data such that, show a non-duplicated row from both table and for any duplicated row, shows only a row with recent date.

I have done some finding and I have tried this:

select *,max(file_date) over (partition by stx_import_date) max_date 
from table_mixed;

where file_date is a date that tell the different date from each table and every row from both table has the same import_date.

num        Code     col1 col2 col3......  import_date    file_date     max_date 
-------- ---------         ------------     ----------  ----------
01         AA       ......                20190308        20190307     20190308
01         AA       ......                20190308        20190308     20190308
02         AA       ......                20190308        20190307     20190307
03         BB       ......                20190308        20190308     20190308

The result from this query show every row (including all duplicated row) and add another column (max_date) that show only the recent file_date for each of the row. But I want the result to show only what I mentioned above and no additional column (max_date).

This is the result that I am looking for:

num        Code     col1 col2 col3...  import_date  file_date   
-------- ---------    ------------    ----------   --------   
01         AA       ......            20190308        20190308      
02         AA       ......            20190308        20190307      
03         BB       ......            20190308        20190308      

Thank you

PS Not only column num, code and import date that need to be duplicated but also other columns that I ..... So, what I mean duplicated row >> every column except file_date (which I have 10+ columns)

PS2 I edited the example so that you guy want get me wrong. There're other columns (like col1, col2, col3 and so on) that also used. How should I use partitioned by in this case

Use row_number window function.

  • Do partition on num,code,import_date..etc columns.
  • Order by on file_date desc

Sample query:

Select * from (
select *,row_number() over (partition by num,code,stx_import_date order by file_date desc) row_number
from table_mixed)t
where t.row_number = 1;

You seem to care about the num column and only want the most recent data. You can do this with your mixed table as:

select tm.*
from (select tm.*,
             row_number() over (partition by num, code, . . . order by file_date desc) as seqnum
      from table_mixed
     ) tm
where seqnum = 1;

Note: If the file dates are the same, then an arbitrary row will be chosen.

This may be more efficient to do when you create the mixed table. You can just do:

select y.*
from y
union all
select x.*
from x left join
     y
     on x.num = y.num and
        x.code = y.code and
        . . .
where y.num is null;

This returns all rows from y (the more recent) along with any non-matching rows from x (and assumes none of the column values are NULL ).

Maybe grouping by the MAX() of whatever date field will be the differentiator?

;WITH get_max_dt AS (
    SELECT TM.[num]
    ,   TM.[Code]
    ,   TM.[import_date]
    ,   MAX(TM.[file_date]) AS [file_date]
    FROM table_mixed AS TM
    GROUP BY TM.[num],TM.[Code],TM.[import_date]

)
SELECT *
FROM get_max_dt

Output:

num  Code  import_date  file_date
01   AA    20190308     20190308
02   AA    20190308     20190307
03   BB    20190308     20190308

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM