简体   繁体   中英

Is there any method to select only new/changed rows without Date column or row dependency in Oracle?

How would you do this?

I'm doing an ETL process on databases that do not currently have a Date_added/Date_updated column. The DBA does not want to add a Date column to the tables so I have to find some alternative way of select only new/changed records for nightly extractions. The databases are huge so the solution has to be space/time effective if possible.

Addressing some follow up questions:

**The tables DO NOT have row dependency enabled.

1) does the table have PK? Yes,each table has PK that is unique only within itself.

2) Huge DB is.... billions, millions? 10 million records in the largest table right now

3) data subset size of new/changed records for nightly extractions? I would guess about 2000 rows per client, times 100 clients to give 200000.

4) any unique values in nightly extractions? There are unique values within each schema.

If you have configured your database for the Oracle Flashback Transaction Query feature, then you can query the database as of a past date, as below:

SELECT * 
FROM mytable
AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '1' DAY);

Thus, from day to day, you can see the changes by using MINUS, as

SELECT *
FROM mytable
MINUS
SELECT * 
FROM mytable
AS OF TIMESTAMP (SYSTIMESTAMP - INTERVAL '1' DAY);

Reference :

Using Oracle Flashback Technology on Oracle® Database Advanced Application Developer's Guide

You could look into ora_rowscn , which is the system change-number (SCN) of the most recent change to a row:

CREATE TABLE bla (foo NUMBER NOT NULL) ROWDEPENDENCIES;

INSERT INTO bla VALUES (1);
COMMIT;

SELECT ORA_ROWSCN, foo FROM bla;
--10832905928770

INSERT INTO bla VALUES (2);
COMMIT;

SELECT ORA_ROWSCN, foo FROM bla;
--10832905928770    1
--10832905928816    2

SELECT ORA_ROWSCN, foo FROM bla where ora_rowscn > 10832905928770;
--10832905928816    2

Depending on the table creation parameter ROWDEPENDENCIES or NOROWDEPENDENCIES (default), the ORA_ROWSCN works on row or block level. Row level would probably be best for your purpose, but it can't be changed after table creation, so some backing up data, drop table, recreate table with rowdependencies, restoring data would be necessary...

More here: http://docs.oracle.com/cd/E11882_01/server.112/e26088/pseudocolumns007.htm#SQLRF50953

If your PK is numeric AND generated from sequence in increasing order then you can record the start and end time and ID in a control table. The control table would have a range of batch primary keys. The control table can have dates, tablename (if you want to apply that design per multiple jobs), status, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM