Enhance performance of large slow dataloading query

Question

I'm trying to load data from oracle to sql server (Sorry for not writing this before)

I have a table(actually a view which has data from different tables) with 1 million records atleast. I designed my package in such a way that i have functions for business logics and call them in select query directly.

Ex:

X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)

Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id) 
FROM Table1

Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns. And some functions compare the parameters passed with audit table and perform logic

How can i improve performance of my query or is there a better way to do this

I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception.

my function does selects and then performs logic for example:

Function(c_x2, eid) 

  Select col1 
    into p_x1 
    from tableP 
   where eid = eid; 

  IF (p_x1 = NULL) THEN 
    ret_var := 'INITIAL'; 
  ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN 
    ret_var:= 'RL'; 

    INSERT INTO Audit
      (old_val, new_val, audit_event, id, pname) 
    VALUES 
      (p_x1, c_x2, 'RL', eid, 'PackageProcName'); 

  ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN 
    ret_var := 'GL'; 

    INSERT INTO Audit
      (old_val, new_val, audit_event, id, pname) 
    VALUES 
      (p_x1, c_x2, 'GL', eid, 'PackgProcName'); 

  END IF; 

RETURN ret_var;

Answer 1

i'm getting each row and performing logic in C# and then inserting

If possible INSERT from the SELECT:

INSERT INTO YourNewTable
        (col1, col2, col3)
    SELECT
        col1, col2, col3
        FROM YourOldTable
        WHERE ....

this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row.

EDIT as for the OP question edit:

you should be able to replace the function call to plain SQL in your query. Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.

EDIT based on OP recent comments:

since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. and all the other details. There are much better ways of bulk loading data into SQL Server. As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.

Answer 2

My recommendation is that you do not use functions and then call them within other SELECT statements. This:

SELECT t.id, ...
       x1(t.id) ...
  FROM TABLE t

...is equivalent to:

SELECT t.id, ...
       (SELECT x.column FROM x1 x WHERE x.id = t.id)
  FROM TABLE t

Encapsulation doesn't work in SQL like when using C#/etc. While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.

A better approach would be to update the supporting function to include the join criteria (IE: " where x.id = t.id " for lack of real one) in the SELECT:

SELECT x.id
       x.column 
  FROM x1 x

...so you can use it as a JOIN:

SELECT t.id, ...
       x1.column
  FROM TABLE t
  JOIN (SELECT x.id,
               x.column 
          FROM MY_PACKAGE.x) x1 ON x1.id = t.id

I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped.

Answer 3

Personally I'd create an SSIS import to do this task. USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.

Answer 4

Create a sorted intex on your table.

Introduction to SQL Server Indizes , other RDBMS are similar.

Edit since you edited your question:

Using a view is even more sub-optimal, especially when querying single rows from it. I think your "busines functions" are actually something like stored procedures?

As others suggested, in SQL always go set based. I assumed you already did that, hence my tip to start using indexing.

Answer 5

A couple of tips:

Don't load all records into RAM but process them one by one.
Try to run as many functions on the client as possible. Databases are really slow to execute user defined functions.
If you need to join two tables, it's sometimes possible to create two connections on the client. Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them.
If your functions always return the same result for the same input, use a computed column or a materialized view. The database will run the function once and save it in a table somewhere. That will make INSERT slow but SELECT quick.

Answer 6

Firstly you need to find where the performance problem actually is. Then you can look at trying to solve it.

What is the performance of the view like? How long does it take the view to execute without any of the function calls? Try running the command
How well does it perform? Does it take 1 minute or 1 hour?
```
\ncreate table the_view_table \nas \nselect * \nfrom the_view; \n
```
How well do the functions perform? According to the description you are making approximately 5 million function calls. They had better be pretty efficient! Also are the functions defined as deterministic . If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls.

Is there a way of reducing the number of function calls? The function are being called once the view has been evaluated and the million rows of data are available. BUT are all the input values from the highest level of the query? Can the function calls be imbeded into the view at a lower level. Consider the following two queries. Which would be quicker?

 select  \n  f.dim_id,  \n  d.dim_col_1,  \n  long_slow_function(d.dim_col_2) as dim_col_2 \nfrom large_fact_table f \njoin small_dim_table d on (f.dim_id = d.dim_id)

 select  \n  f.dim_id,  \n  d.dim_col_1,  \n  d.dim_col_2 \nfrom large_fact_table f \njoin ( \n  select  \n    dim_id,  \n    dim_col_1,  \n    long_slow_function(d.dim_col_2) as dim_col_2 \nfrom small_dim_table) d on (f.dim_id = d.dim_id)

Ideally the second query should run quicker as it calling the function fewer times.

The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts.

Enhance performance of large slow dataloading query

Question

6 answers

solution1
4 2009-09-25 13:55:28

solution2
2 2009-09-25 14:12:16

solution3
1 2009-09-25 21:03:21

solution4
0 2009-09-25 13:31:25

solution5
0 2009-09-25 13:32:41

solution6
0 ACCPTED 2009-09-25 15:29:49

Enhance performance of large slow dataloading query

Question

6 answers

solution1 4 2009-09-25 13:55:28

solution2 2 2009-09-25 14:12:16

solution3 1 2009-09-25 21:03:21

solution4 0 2009-09-25 13:31:25

solution5 0 2009-09-25 13:32:41

solution6 0 ACCPTED 2009-09-25 15:29:49

solution1
4 2009-09-25 13:55:28

solution2
2 2009-09-25 14:12:16

solution3
1 2009-09-25 21:03:21

solution4
0 2009-09-25 13:31:25

solution5
0 2009-09-25 13:32:41

solution6
0 ACCPTED 2009-09-25 15:29:49