简体   繁体   中英

sql query optimization (oracle)

It has been a while since my sql days thus I am wondering whether it is possible to further optimize the following query? The goal is to collect all accounts for each accountant including all the bookings and movements associated with it. Performance/query-time is very important since there are '3 digit'-million datasets...

select Accountant.person_id, 
    Account.account_id, 
    Account.number, 
    Account.balance,
    Account_Type.type_number, 
    Booking.booking_id, 
    Booking.amount, 
    Movement.movement_date, 
    Movement.movement_desc
from Accountant
join Account on Accountant.person_id = Account.person_id
join Account_Type on Account.account_type_id = Account_Type.account_type_id
left outer join Booking on Account.account_id = Booking.account_id
left outer join Movement on Booking.movement_id = Movement.movement_id;

The entity model looks something like that: 在此处输入图片说明

UPDATE: Since some of you are wondering: Yes I am simply selecting hundreds of million of rows since the query is used to migrate data. The data queried is used to construct a new data structure which is put in another database...

  1. To allow returning accounts with no Bookings I added the outer joins. Is that the right syntax?
  2. Here is the explain plan - seeing any optimization possibilities? 在此处输入图片说明
  3. After adding some missing indices the query takes (in an external tool) about 1/2 hour. In java a memory error is thrown at some point. Any hints (except increasing memory) how to optimize that?

As others have already mentioned: It is strange, you want to select hundreds of millions of records in one go.

Aside from that:

  1. The left outer join on table Booking will only work, if you also outer join table Movement.
  2. As you want all records, only full table scans (or fast full index scans for that matter) make sense. Check the explain plan if this is the case. (It should.) Otherwise use /*+full(tablename)*/.
  3. As you will use full table scans you may want to have them run in parallel. Check the explain plan if this is already the case. Otherwise use /*+parallel(tablename,factor)*/.
  4. In case the tables have many columns, it might be good to have indexes containing the desired columns, so fast full index scans instead of full table scans can be used and less disc blocks need to be read thus.
  5. You can reduce disc reads by compressing tables (Oracle 11g and up).

The join syntax is clearer and recommended. I don't think you can optimise the query itself any further, but you could check the DB schema, or investigate alternatives.

DB Schema:

  • are all IDs marked primary (ieindexed and clustered)?
  • can you check the query plans to ensure the query is being executed optimally?

You could also consider a reporting table that is built at the end of every day (or more frequently if required) - and query from that. Oracle supports materialised views that might help.

Finally, a query that returns n,000,000 records really isn't much use to anyone, so you might reconsider your logic. If you're presenting these records to a user, you need to page them (so limit the query to a page size) - or if you're presenting summary info, rewrite the query to use aggregate functions (sum/max/avg etc.) Don't use code to do what a DB does best.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM