如何通过并行处理数据库结果来提高性能？

Question

I have a .net application which runs in the region of 20 to 30 SQL queries and processes the results 1 at a time. 我有一个.net应用程序，它在20到30个SQL查询区域中运行，并一次处理结果1。 I have been trying to increase performance by doing some work in parallel. 我一直在尝试通过并行执行一些工作来提高性能。

2 of the queries take 75% of the time, purely because of the amount of data they return. 其中有2个查询占用了75％的时间，这完全是因为它们返回的数据量很大。 My initial experiments have been to try to split these queries into 4 buckets using ntile and process each datareader in parallel. 我最初的实验是尝试使用ntile将这些查询分为4个存储桶，并并行处理每个数据读取器。 If anything this takes a lot longer, I think because of the extra work involved using NTILE + querying the DB 4 times instead of 1. 如果花了更多时间，我想是因为使用NTILE +查询数据库需要4次而不是1次，因此需要额外的工作。

Can anyone suggest other techniques to try or am I just wasting my time here? 有人可以建议其他方法尝试吗，还是我在这里浪费时间？ The code below is part of a utility class which allows me to queue up the functions which process the reader. 下面的代码是实用程序类的一部分，该类使我可以排队处理阅读器的功能。 So using my NTILE experiment I queue up 4 tasks each processing 1/4 of the data (where ntile =1, 2, 3, 4) and call Execute to run them in parallel. 因此，通过我的NTILE实验，我将4个任务排队，每个任务处理1/4的数据（其中ntile = 1、2、3、4），然后调用Execute并行运行它们。

foreach (var keyValuePair in m_Tasks)
            {
                var sql = keyValuePair.Key;
                var task = keyValuePair.Value;

                var conn = new OracleConnection(ConnectionString);
                conn.BeginOpen(o=> {
                    conn.EndOpen(o);
                    var cmd = conn.CreateCommand();
                    cmd.CommandText = sql;

                    cmd.BeginExecuteReader(a =>
                    {
                        var reader = cmd.EndExecuteReader(a);
                        DateTime endIO = DateTime.Now;
                        Console.WriteLine(TaskName + " " + Thread.CurrentThread.ManagedThreadId + "  IO took: " + (endIO - startTime) + " ended at " + endIO);

                        DateTime taskStart = DateTime.Now;
                        task(reader);
                        DateTime endTAsk = DateTime.Now;
                        Console.WriteLine(TaskName + " " + Thread.CurrentThread.ManagedThreadId + " TAsk took: " + (endTAsk - taskStart) + " ended at " + endTAsk);
                        reader.Close();
                        conn.Close();

                        if (Interlocked.Decrement(ref numTasks) == 0)
                        {
                            finishedEvent.Set();
                        }

                    }, null);

                },
                null

                    );


            }

            finishedEvent.WaitOne();
            DateTime endExecute = DateTime.Now;
            Console.WriteLine(TaskName + " " + Thread.CurrentThread.ManagedThreadId + " EXECUTE took: " + (endExecute - startTime) + " ended at " + endExecute);

        }

Thanks for any help. 谢谢你的帮助。

Answer 1

I think you're right that the cost of doing the NTILE is outweighing the saving of the parallelism. 我认为您是对的，因为执行NTILE的成本超过了并行性的节省。

You need to use something that will split the query sets into clearly separated sets. 您需要使用将查询集拆分为清晰分隔的集的方法。

If your queries are returning less than 15% of the total data (approximately) then breaking down the tables on an index (either an indexed field, or functional index) is probably your best starting point. 如果您的查询返回的数据少于（大约）总数据的15％，则分解索引（索引字段或功能索引）上的表可能是您的最佳起点。

Example : Presuming your data has a numeric pseudo-key on each row, create a functional index on MOD(Id,4) - this would give you an Index based version of your NTILE approach. 示例：假设您的数据在每一行上都有一个数字伪密钥，请在MOD（Id，4）上创建一个功能索引-这将为您提供NTILE方法的基于索引的版本。 (I don't think you can have a functional index on an NTILE). （我认为您不能在NTILE上有功能索引）。

This specific approach is probably counter-productive - you would be getting data from the same blocks in different threads, so potentially increasing I/O (depends on memory). 这种特定的方法可能适得其反-您将在不同线程中的同一块中获取数据，因此潜在地增加了I / O（取决于内存）。

The way that Oracle parallel query tends to do it - provided you want to process over 15% of the data in the table - is to simply break the table into N physical chunks (using the rowid) and then run N 'full scans' on those chunks. 如果您要处理表中超过15％的数据，Oracle并行查询通常会执行此操作的方法是将表分为N个物理块（使用rowid），然后对N个执行“全扫描”这些块。

I'm not sure if you can replicate this approach from the front-end. 我不确定是否可以从前端复制此方法。 Splitting on a key id adds in the cost of going through the index to each row. 拆分键ID会增加遍历索引到每一行的成本。

What you probably want is something that splits the table by something other than the key, or if you split on key, split it by ranges rather than the NTILE approach. 您可能想要的是用除键以外的其他方式拆分表的方法，或者如果您对键进行拆分，则按范围而不是NTILE方法进行拆分。

Answer 2

I use OracleCommand.Fetchsize to improve perfomance on large Queries. 我使用OracleCommand.Fetchsize来提高大型查询的性能。

cmd.FetchSize = &H100000  '1Mb
Dim Rdr = cmd.ExecuteReader

Some time ago, I use Async Readers for get Blob Data. 前一段时间，我使用异步读取器获取Blob数据。 But to use Async Reader you need maintain an array with each async Result an loop until last Reader ends. 但是要使用异步阅读器，您需要为每个异步结果维护一个数组，直到最后一个阅读器结束为止。

   Public Shared Function FromBlob(ByVal Id As String, ByVal Rv As String, ByVal cn As OracleConnection) As Proyecto
     Dim n As Integer, Prj As Proyecto = Nothing
     Dim Bf(2)() As Byte, arrAr(2) As IAsyncResult 'Para proceso asíncrono

     Dim Cmd As New OracleCommand( _
         "Select rv,fecha,Datos From Proyectos Where Id=:Id and Rv in (:Rv,'Av','Est')", cn)
     Cmd.BindByName = True
     Cmd.Parameters.Add("Id", OracleDbType.Varchar2, Id, ParameterDirection.Input)
     Cmd.Parameters.Add("Rv", OracleDbType.Varchar2, Rv, ParameterDirection.Input)
     If Rv Is Nothing Then Prj = Proyecto.Actprj
     Try
        Using Rdr As OracleDataReader = Cmd.ExecuteReader
            Do Until Rdr.Read = False
                Dim rv1 As String = Rdr.GetString(0)
                Select Case rv1
                    Case "Av" : n = 1   'Avance TND
                    Case "Est" : n = 2  'Datos Seguimiento Estudio Seguridad
                    Case Else : n = 0
                End Select
                If Rdr.IsDBNull(2) = False Then
                   Dim Blob As OracleBlob = Rdr.GetOracleBlob(2)
                   Dim Buffer(CInt(Blob.Length)) As Byte
                   Bf(n) = Buffer
                   arrAr(n) = Blob.BeginRead(Buffer, 0, Buffer.Length, Nothing, Blob)
                End If
            Loop
            If Bf(0) Is Nothing AndAlso Prj Is Nothing Then _
               MessageBox.Show("Fallo al cargar proyecto") : Return Nothing
            For n = 0 To Bf.Length - 1
                Dim ar As IAsyncResult = arrAr(n)
                If ar IsNot Nothing AndAlso ar.AsyncWaitHandle.WaitOne() Then
                   Dim blob As OracleBlob = DirectCast(ar.AsyncState, OracleBlob)
                   blob.EndRead(ar)
                   blob.Dispose()
                   If ar.IsCompleted Then
                      Using rd As New BinReader(New MemoryStream(Bf(n)))
                          If n = 0 Then
                             Prj = New Proyecto(rd, False)
                          Else
                             Dim entry = Proyecto.Entry.FromLob(rd), Index = Prj.IndexOf(entry)
                             If Index < 0 Then Prj.Add(entry) Else Prj(Index) = entry
                          End If
                      End Using
                   End If
                End If
            Next
        End Using
        Catch ex As Exception
            MessageBox.Show(ex.Message)
     End Try
     Return Prj
  End Function

Answer 3

You can use Ref Cursor with Oracle to execute some Sql with one OracleCommand: 您可以将Ref Cursor与Oracle一起使用，以通过一个OracleCommand执行一些Sql：

  Dim cmd As New OracleCommand("Begin " _
  & "Open :1 for Select T.CODTRA,SIM,JLA CAL,SUP,RESP,SERV,SubStr(Aparato,1,3) SIS,PERS,(nvl(DUR,0) * 60) as Dur,t.DESTRA,g.DesTra Destrae,OBS from " & TraRec & " T, Trarec_Gee g where T.codtra <> 'RV' and T.Codtra=G.Codtra(+);" _
  & "Open :2 for Select Red,descr from Redes;" _
  & "Open :3 for Select * from Tr_Redes;" _
  & "Open :4 for Select CODTRA,T_COND,COND,DEMORA * 60 as DEMORA from " & TrCondic _
  & ";end;", cn)

  For n = 0 To 3 : cmd.Parameters.Add(Nothing, OracleDbType.RefCursor, ParameterDirection.Output) : Next
  Dim da As New OracleDataAdapter(cmd)
  da.Fill(0, 0, ds.Tnd, ds.Redes, ds.TrRedes, ds.TrCondic)

Note: Da.Fill(0, 0, T1, T2 ...) is a Oracle especific function to retrieve many tables on a single statement. 注意：Da.Fill（0，0，T1，T2 ...）是Oracle特有的函数，用于在单个语句上检索许多表。

Answer 4

Ultimately it has turned out to be an IO bound problem. 最终，这最终成为了IO约束的问题。 I've been able to achieve perf improvements by doing the IO asynchronously. 通过异步执行IO，我已经能够实现性能改进。 NTILE on ROWID does what I wanted but so far it hasn't helped because the problem is IO bound. ROWID上的NTILE可以满足我的要求，但是到目前为止，它没有帮助，因为问题是受IO限制的。

如何通过并行处理数据库结果来提高性能？

问题描述

4 个解决方案

解决方案1
0 已采纳 2010-07-06 09:25:39

解决方案2
0 2010-07-06 10:39:29

解决方案3
0 2010-07-06 11:26:34

解决方案4
0 2010-07-14 09:47:12

如何通过并行处理数据库结果来提高性能？

问题描述

4 个解决方案

解决方案1 0 已采纳 2010-07-06 09:25:39

解决方案2 0 2010-07-06 10:39:29

解决方案3 0 2010-07-06 11:26:34

解决方案4 0 2010-07-14 09:47:12

解决方案1
0 已采纳 2010-07-06 09:25:39

解决方案2
0 2010-07-06 10:39:29

解决方案3
0 2010-07-06 11:26:34

解决方案4
0 2010-07-14 09:47:12