简体繁体 English

使用多线程执行SQL语句

[英]Use of multi threading to execute SQL statements

原文 2014-05-19 13:22:34 0 2 c#/ sql/ .net/ multithreading/ oracle

I have code that carries out data retrieval - basically executes anything from 3 to 12 SQL (oracle) read statements to retrieve data about an object. 我有执行数据检索的代码-基本上执行3到12条SQL（oracle）读取语句中的任何内容来检索有关对象的数据。

Unfortunantly its running slowly (no SQL statement in particular, its just the fact I have so many of them - and they take around 0.2 seconds per statement, which can mean over 2 secs for the code to complete). 不幸的是，它运行缓慢（特别是没有SQL语句，只是我有很多这样的事实-它们每条语句大约需要0.2秒的时间，这意味着2秒钟才能完成代码）。

I am looking into ways of improving the performance. 我正在研究改善性能的方法。 One way is to merge some of the tables into a single query (which can reduce the combined results by 0.5 secs). 一种方法是将某些表合并为一个查询（这可以将合并结果减少0.5秒）。 However it doesn't make sense to merge the rest since there will only be data there under certain cicumstances, and trying to determine when there is data there to marshal could get tricky. 但是，将其余部分合并是没有意义的，因为在某些情况下只有那里的数据，并且试图确定何时有数据要编组可能会很棘手。

I am considering introducing threading into my program, so after the initial query, I would spawn a thread for each of the other queries, so they are executed at the same time. 我正在考虑将线程引入程序中，因此在初始查询之后，我将为其他每个查询产生一个线程，以便它们同时执行。 However I have never used threading and am wary of introducing deadlocks or other pit falls. 但是，我从未使用过线程处理，并且对引入死锁或其他进深陷阱保持警惕。

Currently the other queries marshal the results into different sections of the SAME object. 当前，其他查询将结果编组到SAME对象的不同部分。 Would this cause any issues (ie since we are accessing/updating the same object in different threads though different sections/fields within the object?). 这是否会引起任何问题（即由于我们通过对象中的不同节/字段在不同线程中访问/更新同一对象？）。 Would it be better to return the results and marshal into the object after all the threads have finished? 在所有线程完成后将结果返回并编组到对象中会更好吗？

I know these types of questions are hard to answer since its more general advice, but I would appreciate if anyone thought it was a good idea, or had other suggestions? 我知道这些问题很难回答，因为它的建议较为笼统，但是如果有人认为这是一个好主意，或者有其他建议，我将不胜感激。

2 个解决方案

If you are doing only reading (select from) - don't worry about deadlocks. 如果您只在阅读（从中选择），则不必担心死锁。 Oracle readings are not blockable (mostly). Oracle读数不可阻塞（大多数情况下）。 The biggest problem with threading queries to oracle would be how to deal with connections. 将查询线程化到oracle的最大问题是如何处理连接。 To create connection, run a query and close connection - is very very very bad. 要创建连接，请运行查询并关闭连接-非常非常非常糟糕。 Connections are expensive. 连接很昂贵。 They are also limited, so you don't want to create one million connections to execute your logic. 它们也受到限制，因此您不想创建一百万个连接来执行逻辑。

As a result, you would use some sort of connection pool and put your queries in a queue. 结果，您将使用某种连接池并将查询放入队列中。

Also, I hope you are using bind variables and not string concatenation to pass queries to oracle. 另外，我希望您使用绑定变量而不是字符串连接将查询传递给oracle。

In general, I would collect all the data (better in one query) and only then update the object. 通常，我会收集所有数据（最好在一个查询中），然后才更新对象。 You could also consider to brake your object into it sections. 您也可以考虑将对象制动到其中。

Threading workss perfectly. 线程工作完美。 2 years ago I did a project that used a multi strage / multi threading approeach to push data into a oracle database (and pull some data out of it for updates). 2年前，我做了一个项目，该项目使用多阶段/多线程方法将数据推送到oracle数据库中（并从其中提取一些数据以进行更新）。

I basicallly used a staged approach (a request would go through multiple stages, get consumed there and new data be pusehd to the next stage) and every stage used a configurable thread pool, which would take a message, process it and post the new messages. 我基本上都使用了分阶段的方法（一个请求将经历多个阶段，在那里被消耗，新数据将被注入下一阶段），并且每个阶段都使用可配置的线程池，该线程池将接收一条消息，对其进行处理并发布新消息。。

We used I think at that time close to 200 threads to process about a million SQL statements per minute (hitting an Oracle Exadata that was really getting some work out of that). 我们当时使用的速度大约是200个线程，每分钟处理大约一百万条SQL语句（选择一个Oracle Exadata确实从中得到了一些工作）。

So, multithreading "just works" - obviously if you know how to do it and you have to get your architecture and the sql statements nice and non blocking. 因此，多线程“行之有效”-显然，如果您知道该怎么做，并且必须使体系结构和sql语句良好且无阻塞。 Databases in general are perfectly calable of handling multiple threads. 通常，数据库完全可以处理多个线程。

Now, for details: THAT DEPENDS. 现在，有关详细信息：取决于。

Example: 例：

Currently the other queries marshal the results into different sections of the SAME object. 当前，其他查询将结果编组到SAME对象的不同部分。 Would this cause any issues (ie since we are accessing/updating the same object in different threads though different sections/fields within the object?) 这是否会引起任何问题（即由于我们通过对象中的不同部分/字段在不同线程中访问/更新同一对象？）

Absolutely no problem as long as: 绝对没有问题，只要：

You make suer all updates are finished before moving the object to the next phase and 在将对象移至下一阶段之前，请确保所有更新已完成。
The updates do not overlap or have a cardinality (1 must finish for 2 to have the required data). 更新不会重叠或具有基数（必须完成1才能使2具有必需的数据）。

These are implementation details and it is really hard to make a generic answer for those (totally impossible). 这些是实现细节，很难为这些问题做出通用答案（完全不可能）。 Especially as this is multi threading 101 - and has nothing to do with any database access. 尤其是因为它是多线程101-与任何数据库访问无关。

In general - you will also have to tune the number of threads. 通常，您还必须调整线程数。 .NET can not do that itself - as it will see the CPU not busy and spawn up more threads, even if the database server is the bottleneck. .NET本身无法做到这一点-因为它将看到CPU不忙，并产生更多线程，即使数据库服务器成为瓶颈。 This is why we went with multiple stages - so we could tune the number of threads depending what they do (and the last stage used bulk inserting to insert the aggregated data into temporary staging tables with a small number of threads, moving a lot of data in every statement - this will require some tuning possibilities to not totally overload the database side). 这就是为什么我们要经历多个阶段的原因-因此我们可以根据线程的工作来调整线程的数量（最后一个阶段使用批量插入将聚合数据插入具有少量线程的临时登台表中，从而移动大量数据在每条语句中-这将需要一些调整的可能性，以使数据库端不会完全过载。