[英]Enhance performance of large slow dataloading query
I'm trying to load data from oracle to sql server (Sorry for not writing this before) 我正在尝试将数据从oracle加载到sql server(很抱歉,以前没有写过此信息)
I have a table(actually a view which has data from different tables) with 1 million records atleast. 我有一个表(实际上是一个视图,其中包含来自不同表的数据)至少具有一百万条记录。 I designed my package in such a way that i have functions for business logics and call them in select query directly.
我以一种具有业务逻辑功能的方式设计了程序包,并直接在select查询中调用它们。
Ex: 例如:
X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)
Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id)
FROM Table1
Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns. 注意:我的表有20列,我至少要6-7列调用5个不同的函数。 And some functions compare the parameters passed with audit table and perform logic
一些功能比较传递给审计表的参数并执行逻辑
How can i improve performance of my query or is there a better way to do this 我如何才能提高查询性能,或者有更好的方法来做到这一点
I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception. 我尝试用C#代码执行此操作,但是记录的初始选择足以容纳数据集,并且出现内存不足的异常。
my function does selects and then performs logic for example: 我的函数会选择并执行逻辑,例如:
Function(c_x2, eid)
Select col1
into p_x1
from tableP
where eid = eid;
IF (p_x1 = NULL) THEN
ret_var := 'INITIAL';
ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN
ret_var:= 'RL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'RL', eid, 'PackageProcName');
ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN
ret_var := 'GL';
INSERT INTO Audit
(old_val, new_val, audit_event, id, pname)
VALUES
(p_x1, c_x2, 'GL', eid, 'PackgProcName');
END IF;
RETURN ret_var;
i'm getting each row and performing logic in C# and then inserting
我正在获取每一行并在C#中执行逻辑,然后插入
If possible INSERT from the SELECT: 如果可能,请从SELECT中插入:
INSERT INTO YourNewTable
(col1, col2, col3)
SELECT
col1, col2, col3
FROM YourOldTable
WHERE ....
this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row. 这将运行多单查询显著更快哪里,那么你在结果集循环,并为每个行的一个INSERT。
EDIT as for the OP question edit: 编辑作为OP问题编辑:
you should be able to replace the function call to plain SQL in your query. 您应该能够在查询中替换对普通SQL的函数调用。 Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE.
使用LEFT JOIN tableP模仿“ initial”,并且可以使用CASE计算“ RL”或“ GL”。
EDIT based on OP recent comments: 根据OP最近的评论进行编辑 :
since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. 由于您是将数据从Oracle加载到SQL Server中,所以我会这样做:大多数可以帮助您的人已经继续前进,不会再阅读此问题,因此在您说一个新问题的地方打开一个新问题:1)您需要加载数据(从Oracle(版本)到SQL Server版本2)当前,您正在通过一个查询来加载它,并处理C#中的每一行并将其插入SQL Server,这很慢。 and all the other details.
以及所有其他详细信息。 There are much better ways of bulk loading data into SQL Server.
有很多更好的方法将数据批量加载到SQL Server。 As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted.
对于这个问题,您可以接受答案,在解释您需要提出新问题的地方回答自己,或者不接受。
My recommendation is that you do not use functions and then call them within other SELECT statements. 我的建议是不要使用函数,然后在其他SELECT语句中调用它们。 This:
这个:
SELECT t.id, ...
x1(t.id) ...
FROM TABLE t
...is equivalent to: ...相当于:
SELECT t.id, ...
(SELECT x.column FROM x1 x WHERE x.id = t.id)
FROM TABLE t
Encapsulation doesn't work in SQL like when using C#/etc. 像在使用C#/ etc一样,封装在SQL中不起作用。 While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned.
尽管该方法使维护更加容易,但是由于子选择将针对返回的每一行执行,因此性能会受到影响。
A better approach would be to update the supporting function to include the join criteria (IE: " where x.id = t.id
" for lack of real one) in the SELECT: 更好的方法是将支持功能更新为在SELECT中包括
where x.id = t.id
条件(即:缺少真实值的“ where x.id = t.id
”):
SELECT x.id
x.column
FROM x1 x
...so you can use it as a JOIN: ...因此您可以将其用作JOIN:
SELECT t.id, ...
x1.column
FROM TABLE t
JOIN (SELECT x.id,
x.column
FROM MY_PACKAGE.x) x1 ON x1.id = t.id
I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped. 我更喜欢为了维护而不得不将函数逻辑合并到查询中,但是有时它无济于事。
Personally I'd create an SSIS import to do this task. 我个人将创建一个SSIS导入来执行此任务。 USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert.
使用大容量插件可以显着提高速度,而SSIS可以在批量插入后处理功能部件。
Create a sorted intex on your table. 在表上创建一个排序的intex。
Introduction to SQL Server Indizes , other RDBMS are similar. SQL Server Indizes简介 ,其他RDBMS相似。
Edit since you edited your question: 编辑,因为您编辑了问题:
Using a view is even more sub-optimal, especially when querying single rows from it. 使用视图更为不理想,尤其是在从视图查询单行时。 I think your "busines functions" are actually something like stored procedures?
我认为您的“业务功能”实际上类似于存储过程吗?
As others suggested, in SQL always go set based. 就像其他人建议的那样,在SQL中总是基于集合。 I assumed you already did that, hence my tip to start using indexing.
我以为您已经做到了,因此开始使用索引的提示。
A couple of tips: 一些提示:
INSERT
slow but SELECT
quick. INSERT
变慢,但SELECT
快。 Firstly you need to find where the performance problem actually is. 首先,您需要找到实际的性能问题所在。 Then you can look at trying to solve it.
然后,您可以查看尝试解决它的方法。
What is the performance of the view like? 视图的表现如何? How long does it take the view to execute without any of the function calls?
在没有任何函数调用的情况下视图执行需要多长时间? Try running the command
尝试运行命令
How well does it perform? 它的表现如何? Does it take 1 minute or 1 hour?
需要1分钟还是1小时?
\ncreate table the_view_table创建表the_view_table\nas
如\nselect *
选择 *\nfrom the_view;
从the_view;\n
How well do the functions perform? 功能执行得如何? According to the description you are making approximately 5 million function calls.
根据描述,您将进行大约500万个函数调用。 They had better be pretty efficient!
他们最好效率很高! Also are the functions defined as
deterministic
. 也定义为
deterministic
的功能。 If the functions are defined using the deterministic
keyword, the Oracle has a chance of optimizing away some of the calls. 如果使用
deterministic
关键字定义函数,则Oracle有机会优化一些调用。
Is there a way of reducing the number of function calls? 有没有减少函数调用次数的方法? The function are being called once the view has been evaluated and the million rows of data are available.
一旦对视图进行了评估并且数百万行的数据可用,就会调用该函数。 BUT are all the input values from the highest level of the query?
但是所有输入值都来自查询的最高级别吗? Can the function calls be imbeded into the view at a lower level.
可以将函数调用嵌入到较低级别的视图中。 Consider the following two queries.
考虑以下两个查询。 Which would be quicker?
哪个会更快?
select选择 \n f.dim_id,
f.dim_id, \n d.dim_col_1,
d.dim_col_1, \n long_slow_function(d.dim_col_2) as dim_col_2
long_slow_function(d.dim_col_2)as dim_col_2\nfrom large_fact_table f
来自large_fact_table f\njoin small_dim_table d on (f.dim_id = d.dim_id)
加入small_dim_table d on(f.dim_id = d.dim_id)
select选择 \n f.dim_id,
f.dim_id, \n d.dim_col_1,
d.dim_col_1, \n d.dim_col_2
d.dim_col_2\nfrom large_fact_table f
来自large_fact_table f\njoin (
加入(\n select
选择 \n dim_id,
dim_id, \n dim_col_1,
dim_col_1, \n long_slow_function(d.dim_col_2) as dim_col_2
long_slow_function(d.dim_col_2)as dim_col_2\nfrom small_dim_table) d on (f.dim_id = d.dim_id)
来自small_dim_table)d on(f.dim_id = d.dim_id)
Ideally the second query should run quicker as it calling the function fewer times. 理想情况下,第二个查询应运行得更快,因为它调用函数的次数更少。
The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts. 性能问题可能出在任何这些地方,并且在您调查问题之前,很难知道将调整工作定向到何处。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.