简体   繁体   中英

Use of multi threading to execute SQL statements

I have code that carries out data retrieval - basically executes anything from 3 to 12 SQL (oracle) read statements to retrieve data about an object.

Unfortunantly its running slowly (no SQL statement in particular, its just the fact I have so many of them - and they take around 0.2 seconds per statement, which can mean over 2 secs for the code to complete).

I am looking into ways of improving the performance. One way is to merge some of the tables into a single query (which can reduce the combined results by 0.5 secs). However it doesn't make sense to merge the rest since there will only be data there under certain cicumstances, and trying to determine when there is data there to marshal could get tricky.

I am considering introducing threading into my program, so after the initial query, I would spawn a thread for each of the other queries, so they are executed at the same time. However I have never used threading and am wary of introducing deadlocks or other pit falls.

Currently the other queries marshal the results into different sections of the SAME object. Would this cause any issues (ie since we are accessing/updating the same object in different threads though different sections/fields within the object?). Would it be better to return the results and marshal into the object after all the threads have finished?

I know these types of questions are hard to answer since its more general advice, but I would appreciate if anyone thought it was a good idea, or had other suggestions?

If you are doing only reading (select from) - don't worry about deadlocks. Oracle readings are not blockable (mostly). The biggest problem with threading queries to oracle would be how to deal with connections. To create connection, run a query and close connection - is very very very bad. Connections are expensive. They are also limited, so you don't want to create one million connections to execute your logic.

As a result, you would use some sort of connection pool and put your queries in a queue.

Also, I hope you are using bind variables and not string concatenation to pass queries to oracle.

In general, I would collect all the data (better in one query) and only then update the object. You could also consider to brake your object into it sections.

Threading workss perfectly. 2 years ago I did a project that used a multi strage / multi threading approeach to push data into a oracle database (and pull some data out of it for updates).

I basicallly used a staged approach (a request would go through multiple stages, get consumed there and new data be pusehd to the next stage) and every stage used a configurable thread pool, which would take a message, process it and post the new messages.

We used I think at that time close to 200 threads to process about a million SQL statements per minute (hitting an Oracle Exadata that was really getting some work out of that).

So, multithreading "just works" - obviously if you know how to do it and you have to get your architecture and the sql statements nice and non blocking. Databases in general are perfectly calable of handling multiple threads.

Now, for details: THAT DEPENDS.

Example:

Currently the other queries marshal the results into different sections of the SAME object. Would this cause any issues (ie since we are accessing/updating the same object in different threads though different sections/fields within the object?)

Absolutely no problem as long as:

  • You make suer all updates are finished before moving the object to the next phase and
  • The updates do not overlap or have a cardinality (1 must finish for 2 to have the required data).

These are implementation details and it is really hard to make a generic answer for those (totally impossible). Especially as this is multi threading 101 - and has nothing to do with any database access.

In general - you will also have to tune the number of threads. .NET can not do that itself - as it will see the CPU not busy and spawn up more threads, even if the database server is the bottleneck. This is why we went with multiple stages - so we could tune the number of threads depending what they do (and the last stage used bulk inserting to insert the aggregated data into temporary staging tables with a small number of threads, moving a lot of data in every statement - this will require some tuning possibilities to not totally overload the database side).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM