简体   繁体   English

增强大型慢速数据加载查询的性能

[英]Enhance performance of large slow dataloading query

I'm trying to load data from oracle to sql server (Sorry for not writing this before) 我正在尝试将数据从oracle加载到sql server(很抱歉,以前没有写过此信息)

I have a table(actually a view which has data from different tables) with 1 million records atleast. 我有一个表(实际上是一个视图,其中包含来自不同表的数据)至少具有一百万条记录。 I designed my package in such a way that i have functions for business logics and call them in select query directly. 我以一种具有业务逻辑功能的方式设计了程序包,并直接在select查询中调用它们。

Ex: 例如:

X1(id varchar2)
x2(id varchar2, d1 date)
x3(id varchar2, d2 date)

Select id, x, y, z, decode (.....), x1(id), x2(id), x3(id) 
FROM Table1

Note: My table has 20 columns and i call 5 different functions on atleast 6-7 columns. 注意:我的表有20列,我至少要6-7列调用5个不同的函数。 And some functions compare the parameters passed with audit table and perform logic 一些功能比较传递给审计表的参数并执行逻辑

How can i improve performance of my query or is there a better way to do this 我如何才能提高查询性能,或者有更好的方法来做到这一点

I tried doing it in C# code but initial select of records is large enough for dataset and i get outofmemory exception. 我尝试用C#代码执行此操作,但是记录的初始选择足以容纳数据集,并且出现内存不足的异常。

my function does selects and then performs logic for example: 我的函数会选择并执行逻辑,例如:

Function(c_x2, eid) 

  Select col1 
    into p_x1 
    from tableP 
   where eid = eid; 

  IF (p_x1 = NULL) THEN 
    ret_var := 'INITIAL'; 
  ELSIF (p_x1 = 'L') AND (c_x2 = 'A') THEN 
    ret_var:= 'RL'; 

    INSERT INTO Audit
      (old_val, new_val, audit_event, id, pname) 
    VALUES 
      (p_x1, c_x2, 'RL', eid, 'PackageProcName'); 

  ELSIF (p_x1 = 'A') AND (c_x2 = 'L') THEN 
    ret_var := 'GL'; 

    INSERT INTO Audit
      (old_val, new_val, audit_event, id, pname) 
    VALUES 
      (p_x1, c_x2, 'GL', eid, 'PackgProcName'); 

  END IF; 

RETURN ret_var;

i'm getting each row and performing logic in C# and then inserting 我正在获取每一行并在C#中执行逻辑,然后插入

If possible INSERT from the SELECT: 如果可能,请从SELECT中插入:

INSERT INTO YourNewTable
        (col1, col2, col3)
    SELECT
        col1, col2, col3
        FROM YourOldTable
        WHERE ....

this will run significantly faster than a single query where you then loop over the result set and have an INSERT for each row. 这将运行多单查询显著更快哪里,那么你在结果集循环,并为每个行的一个INSERT。

EDIT as for the OP question edit: 编辑作为OP问题编辑:

you should be able to replace the function call to plain SQL in your query. 您应该能够在查询中替换对普通SQL的函数调用。 Mimic the "initial" using a LEFT JOIN tableP, and the "RL" or "GL" can be calculated using CASE. 使用LEFT JOIN tableP模仿“ initial”,并且可以使用CASE计算“ RL”或“ GL”。

EDIT based on OP recent comments: 根据OP最近的评论进行编辑

since you are loading data from Oracle into SQL Server, this is what I would do: most people that could help have moved on and will not read this question again, so open a new question where you say: 1) you need to load data from Oracle (version) to SQL Server Version 2) currently you are loading it from one query processing each row in C# and inserting it into SQL Server, and it is slow. 由于您是将数据从Oracle加载到SQL Server中,所以我会这样做:大多数可以帮助您的人已经继续前进,不会再阅读此问题,因此在您说一个新问题的地方打开一个新问题:1)您需要加载数据(从Oracle(版本)到SQL Server版本2)当前,您正在通过一个查询来加载它,并处理C#中的每一行并将其插入SQL Server,这很慢。 and all the other details. 以及所有其他详细信息。 There are much better ways of bulk loading data into SQL Server. 有很多更好的方法将数据批量加载到SQL Server。 As for this question, you could accept an answer, answer yourself where you explain you need to ask a new question, or just leave it unaccepted. 对于这个问题,您可以接受答案,在解释您需要提出新问题的地方回答自己,或者不接受。

My recommendation is that you do not use functions and then call them within other SELECT statements. 我的建议是不要使用函数,然后在其他SELECT语句中调用它们。 This: 这个:

SELECT t.id, ...
       x1(t.id) ...
  FROM TABLE t

...is equivalent to: ...相当于:

SELECT t.id, ...
       (SELECT x.column FROM x1 x WHERE x.id = t.id)
  FROM TABLE t

Encapsulation doesn't work in SQL like when using C#/etc. 像在使用C#/ etc一样,封装在SQL中不起作用。 While the approach makes maintenance easier, performance suffers because sub selects will execute for every row returned. 尽管该方法使维护更加容易,但是由于子选择将针对返回的每一行执行,因此性能会受到影响。

A better approach would be to update the supporting function to include the join criteria (IE: " where x.id = t.id " for lack of real one) in the SELECT: 更好的方法是将支持功能更新为在SELECT中包括where x.id = t.id条件(即:缺少真实值的“ where x.id = t.id ”):

SELECT x.id
       x.column 
  FROM x1 x

...so you can use it as a JOIN: ...因此您可以将其用作JOIN:

SELECT t.id, ...
       x1.column
  FROM TABLE t
  JOIN (SELECT x.id,
               x.column 
          FROM MY_PACKAGE.x) x1 ON x1.id = t.id

I prefer that to having to incorporate the function logic into the queries for sake of maintenance, but sometimes it can't be helped. 我更喜欢为了维护而不得不将函数逻辑合并到查询中,但是有时它无济于事。

Personally I'd create an SSIS import to do this task. 我个人将创建一个SSIS导入来执行此任务。 USing abulk insert you can imporve speed dramitcally and SSIS can handle the functions part after the bulk insert. 使用大容量插件可以显着提高速度,而SSIS可以在批量插入后处理功能部件。

Create a sorted intex on your table. 在表上创建一个排序的intex。

Introduction to SQL Server Indizes , other RDBMS are similar. SQL Server Indizes简介 ,其他RDBMS相似。

Edit since you edited your question: 编辑,因为您编辑了问题:

Using a view is even more sub-optimal, especially when querying single rows from it. 使用视图更为不理想,尤其是在从视图查询单行时。 I think your "busines functions" are actually something like stored procedures? 我认为您的“业务功能”实际上类似于存储过程吗?

As others suggested, in SQL always go set based. 就像其他人建议的那样,在SQL中总是基于集合。 I assumed you already did that, hence my tip to start using indexing. 我以为您已经做到了,因此开始使用索引的提示。

A couple of tips: 一些提示:

  • Don't load all records into RAM but process them one by one. 不要将所有记录加载到RAM中,而是一一处理。
  • Try to run as many functions on the client as possible. 尝试在客户端上运行尽可能多的功能。 Databases are really slow to execute user defined functions. 数据库执行用户定义的函数的速度确实很慢。
  • If you need to join two tables, it's sometimes possible to create two connections on the client. 如果需要连接两个表,有时可以在客户端上创建两个连接。 Fetch the data main data with connection 1 and the audit data with connection 2. Order the data for both connections in the same way so you can read single records from both connections and perform whatever you need on them. 通过连接1获取数据主数据,并通过连接2获取审计数据。以相同的方式对两个连接的数据进行排序,以便您可以从两个连接中读取单个记录并对其执行所需的任何操作。
  • If your functions always return the same result for the same input, use a computed column or a materialized view. 如果您的函数对于相同的输入总是返回相同的结果,请使用计算列或实例化视图。 The database will run the function once and save it in a table somewhere. 数据库将运行一次该函数并将其保存在表的某个地方。 That will make INSERT slow but SELECT quick. 这将使INSERT变慢,但SELECT快。

Firstly you need to find where the performance problem actually is. 首先,您需要找到实际的性能问题所在。 Then you can look at trying to solve it. 然后,您可以查看尝试解决它的方法。

  1. What is the performance of the view like? 视图的表现如何? How long does it take the view to execute without any of the function calls? 在没有任何函数调用的情况下视图执行需要多长时间? Try running the command 尝试运行命令

    How well does it perform? 它的表现如何? Does it take 1 minute or 1 hour? 需要1分钟还是1小时?

    \ncreate table the_view_table 创建表the_view_table\nas \nselect * 选择 *\nfrom the_view; 从the_view;\n
  2. How well do the functions perform? 功能执行得如何? According to the description you are making approximately 5 million function calls. 根据描述,您将进行大约500万个函数调用。 They had better be pretty efficient! 他们最好效率很高! Also are the functions defined as deterministic . 也定义为deterministic的功能。 If the functions are defined using the deterministic keyword, the Oracle has a chance of optimizing away some of the calls. 如果使用deterministic关键字定义函数,则Oracle有机会优化一些调用。

  3. Is there a way of reducing the number of function calls? 有没有减少函数调用次数的方法? The function are being called once the view has been evaluated and the million rows of data are available. 一旦对视图进行了评估并且数百万行的数据可用,就会调用该函数。 BUT are all the input values from the highest level of the query? 但是所有输入值都来自查询的最高级别吗? Can the function calls be imbeded into the view at a lower level. 可以将函数调用嵌入到较低级别的视图中。 Consider the following two queries. 考虑以下两个查询。 Which would be quicker? 哪个会更快?

     select 选择 \n  f.dim_id, f.dim_id, \n  d.dim_col_1, d.dim_col_1, \n  long_slow_function(d.dim_col_2) as dim_col_2 long_slow_function(d.dim_col_2)as dim_col_2\nfrom large_fact_table f 来自large_fact_table f\njoin small_dim_table d on (f.dim_id = d.dim_id) 加入small_dim_table d on(f.dim_id = d.dim_id) 
     select 选择 \n  f.dim_id, f.dim_id, \n  d.dim_col_1, d.dim_col_1, \n  d.dim_col_2 d.dim_col_2\nfrom large_fact_table f 来自large_fact_table f\njoin ( 加入(\n  select 选择 \n    dim_id, dim_id, \n    dim_col_1, dim_col_1, \n    long_slow_function(d.dim_col_2) as dim_col_2 long_slow_function(d.dim_col_2)as dim_col_2\nfrom small_dim_table) d on (f.dim_id = d.dim_id) 来自small_dim_table)d on(f.dim_id = d.dim_id) 

    Ideally the second query should run quicker as it calling the function fewer times. 理想情况下,第二个查询应运行得更快,因为它调用函数的次数更少。

The performance issue could be in any of these places and until you investigate the issue, it would be difficult to know where to direct your tuning efforts. 性能问题可能出在任何这些地方,并且在您调查问题之前,很难知道将调整工作定向到何处。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM