简体   繁体   English

C#+ Sql Server - 执行大量存储过程。最好的办法?

[英]C# + Sql Server - Execute a stored procedure large number of times. Best way?

I have one stored procedure which inserts data into 3 tables, (does UPSERTS), and has some rudamentary logic. 我有一个存储过程将数据插入3个表中(UPSERTS),并且有一些逻辑。 (IF-THEN-ELSE) (IF-THEN-ELSE)

I need to execute this Sproc millions of times (From a C# app) using different parameters and I need it to be FAST. 我需要使用不同的参数执行这个Sproc数百万次(来自C#应用程序),我需要它是快速的。

What is the best way to do so? 这样做的最佳方法是什么?

Does anybody know an open-source (or not) off the shelf document indexer besides Lucene or Sql Server FTS?? 除了Lucene或Sql Server FTS之外,有没有人知道开源(或不是)现成的文档索引器?

*I am trying to build a document word-index. *我正在尝试构建一个文档单词索引。 For each word in the document I insert into the DB the word, docID, and word position. 对于文档中的每个单词,我在DB中插入单词,docID和单词位置。

This happens 100000 times for 100 documents for example. 例如,对于100个文档,这发生了100000次。

The Sproc : there are 3 tables to insert into, for each one I do an UPSERT. Sproc :有3个表可插入,每个表都有一个UPSERT。

The C# app : C#应用程序

using (SqlConnection con = new SqlConnection(_connectionString))
            {
                con.Open();
                SqlTransaction trans = con.BeginTransaction();
                SqlCommand command = new SqlCommand("add_word", con, trans);
                command.CommandType = System.Data.CommandType.StoredProcedure;
                string[] TextArray;
                for (int i = 0; i < Document.NumberOfFields; i++)
                {
                  ...
                 Addword(..., command);  <---- this updates parameters with new values and ExecuteNonQuery.
                }

            }

I Forgot to mention , this code produces deadlocks in Sql Server . 我忘了提一下,这段代码在Sql Server中产生死锁。 I have no idea why this happens. 我不知道为什么会这样。

  1. Drop all the indexes on the table(s) you are loading, then add them back in once the load is complete. 删除要加载的表上的所有索引,然后在加载完成后重新添加它们。 This will prevent a lot of thrashing / reindexing for each change. 这将防止每次更改的大量颠簸/重新索引。

  2. Make sure the database has allocated enough physical file space prior to the load that way it doesn't have to spend time constantly grabbing it from the file system as you load. 确保数据库在加载之前已经分配了足够的物理文件空间,因为它不必花费时间在加载时不断地从文件系统中获取它。 Usually databases are set to grow by something like 10% when full at which point sql server blocks queries until more space is allocated. 通常,数据库设置为在完全时增长10%,此时sql server会阻止查询,直到分配更多空间。 When loading the amount of data you are talking about, sql will have to do a lot of blocking. 当你加载你正在谈论的数据量时,sql将不得不做很多阻塞。

  3. Look into bulk load / bulk copy if possible. 如果可能,请查看批量加载/批量复制。

  4. Do all of your IF THEN ELSE logic in code. 在代码中执行所有IF THEN ELSE逻辑。 Just send the actual values you want stored to the s'proc when it's ready. 只需将存储的实际值发送到s'proc即可。 You might even run two threads. 你甚至可以运行两个线程。 One to evaluate the data and queue it up, the other to write the queue to the DB server. 一个用于评估数据并将其排队,另一个用于将队列写入DB服务器。

  5. Look into Off The Shelf programs that do exactly what you are talking about with indexing the documents. 查看Off The Shelf程序,它们通过索引文档来完成您正在谈论的内容。 Most likely they've solved these problems. 他们很可能已经解决了这些问题。

  6. Get rid of the Transaction requirements if possible. 如果可能,摆脱交易要求。 Try to keep the s'proc calls as simple as possible. 尽量保持s'proc调用尽可能简单。

  7. See if you can limit the words you are storing. 看看你是否可以限制你存储的单词。 For example, if you don't care about the words "it", "as", "I", etc then filter them out BEFORE calling the s'proc. 例如,如果您不关心单词“it”,“as”,“I”等,则在调用s'proc之前将其过滤掉。

如果要快速从C#批量INSERT数据,请查看SqlBulkCopy类(从.NET 2.0开始)。

This is probably too generic as a requirement - in order for the procedure to be fast itself we need to see it and have some knowledge of your db-schema. 这可能是一个非常通用的要求 - 为了使程序快速,我们需要查看它并了解您的db-schema。

On the other end if you want to know what the best way to execute as fast as possible the same (non-optimized or optimized) procedure, usually the best way to go is to do some sort of caching on the client and call the procedure as few times as possible batching your operations. 另一方面,如果您想知道尽可能快地执行相同(非优化或优化)过程的最佳方法,通常最好的方法是在客户端上执行某种缓存并调用过程尽可能少地批量操作。

If this is in a loop, what people usually do is - instead of calling the procedure each iteration - build/populate some caching data structure that will call down to the store procedure when the loop exits (or any given number of loops if you need this to happen more often) batching the operations that you cached (ie you can pass an xml string down to your sp which will then parse it, put the stuff in temp tables and then go from there - you can save a whole lot of overhead like this). 如果这是循环,人们通常做的是 - 而不是每次迭代调用过程 - 构建/填充一些缓存数据结构,当循环退出时将调用存储过程(或者如果需要则调用任何给定数量的循环)这更常发生)批处理你缓存的操作(即你可以将一个xml字符串传递给你的sp,然后解析它,将这些东西放在临时表中,然后从那里开始 - 你可以节省大量的开销像这样)。

Another common solution solution for this is to use SqlServer Bulk operations. 另一个常见的解决方案是使用SqlServer Bulk操作。

To go back to the stored procedure - keep into account that optimizing your T-SQL and db-schema (with indexes etc.) can have a glorious impact on your performance. 回到存储过程 - 请考虑优化T-SQL和db-schema(带索引等)可以对您的性能产​​生光辉的影响。

This might seem like a rudimentary approach, but it should work and it should be fast. 这可能看起来像一个基本的方法,但它应该工作,它应该是快速的。 You can just generate a huge textfile with a list of SQL statements and then run it from a command line. 您可以使用SQL语句列表生成一个巨大的文本文件,然后从命令行运行它。 If I'm not mistaken it should be possible to batch commands using the GO statement. 如果我没有弄错,应该可以使用GO语句批处理命令。 Alternatively, you can do it directly from you application concatenating several SQL commands as strings and execute them in batches. 或者,您可以直接从应用程序将多个SQL命令连接为字符串并批量执行它们。 It seems that what you are trying to do is a onetime task and that the data does not come directly as auser input. 看起来你要做的是一次性任务,数据不是直接作为用户输入。 So you should be able to handle escapign yourself. 所以你应该能够自己处理逃避问题。

I'm sure there are more sophisticated ways to do that (the SqlBulkCopy looks like a good start), so please consider this as just a suggestion. 我确信有更复杂的方法可以做到这一点( SqlBulkCopy看起来是一个好的开始),所以请将此视为一个建议。 I would spend some time investigating whether there are not more elegant ways better ways first. 我会花一些时间来调查是否有更优雅的方式更好的方法。

Also, I would make sure that the logic in the stored procedure is as simple as possible and that the table does not have any indexes. 此外,我将确保存储过程中的逻辑尽可能简单,并且表没有任何索引。 They should be added later. 它们应该在以后添加。

Try use XML to do that. 尝试使用XML来做到这一点。

You just will need execute 1 time: 你只需要执行1次:

Example: 例:

DECLARE @XMLDoc XML
SET @XMLDoc = '<words><word>test</word><word>test2</word></words>'

CREATE PROCEDURE add_words
(
    @XMLDoc XML
)
AS

DECLARE @handle INT

EXEC sp_xml_preparedocument @handle OUTPUT, @XMLDoc

INSERT INTO TestTable
SELECT * FROM OPENXML (@handle, '/words', 2) WITH 
  (
    word varchar(100)
  )
EXEC sp_xml_removedocument @handle

If you're trying to optimize for speed, consider simply upgrading your SQL Server hardware. 如果您正在尝试优化速度,请考虑简单地升级SQL Server硬件。 Putting some RAM and a blazing fast RAID in your server may be the most cost effective long-term solution to speed up your query speed. 在服务器中放置一些RAM和超快的RAID可能是提高查询速度的最具成本效益的长期解决方案。 Hardware is relatively cheap compared to developer time. 与开发人员时间相比,硬件相对便宜。

Heed the words of Jeff Atwood: 注意杰夫阿特伍德的话:

Coding Horror: Hardware is Cheap, Programmers are Expensive 编码恐怖:硬件很便宜,程序员很贵

The communication with the database will likely be a bottle-neck in this case, especially if the db is on another machine. 在这种情况下,与数据库的通信可能是一个瓶颈,特别是如果数据库在另一台机器上。 I suggest sending the entire document to the database and writing a sproc that splits it into words, or use sql-server hosted managed code. 我建议将整个文档发送到数据库并编写一个将其拆分为单词的sproc,或者使用sql-server托管的托管代码。

Assuming this is an app where there would not be contention between multiple users, try this approach instead: 假设这是一个不会在多个用户之间发生争用的应用,请尝试以下方法:

  • Insert your parameters into a table set up for that purpose 将参数插入为此目的设置的表中
  • Change your SP to loop through that table and perform its work on each row 更改您的SP以循环该表并在每一行上执行其工作
  • Call the SP once 拨打SP一次
  • Have the SP truncate the table of inputs when it is complete 让SP在输入完成时截断输入表

This will eliminate the overhead of calling the SP millions of times, and the inserts of the parameters into the table can be concatenated ("INSERT INTO foo(v) VALUE('bar'); INSERT INTO foo(v) VALUE('bar2'); INSERT INTO foo(v) VALUE('bar3');"). 这将消除调用SP数百万次的开销,并且可以连接表中参数的插入(“INSERT INTO foo(v)VALUE('bar'); INSERT INTO foo(v)VALUE('bar2 '); INSERT INTO foo(v)VALUE('bar3');“)。

Disadvantage: the SP is going to take a long time to execute, and there won't be any feedback of progress, which isn't terribly user-friendly. 缺点:SP需要很长时间才能执行,并且不会有任何进度反馈,这不是非常用户友好的。

要将大量数据移动到服务器,请使用SqlBulkCopy或表值参数(如果您在2008年)。如果您需要速度,请不要每行执行一次存储过程,开发一个基于集合处理所有数据的集合(或者大批量的)行。

--Edited since the question was edited. - 自问题编辑以来编辑。

The biggest issue is to make sure the stored proc is correctly tuned. 最大的问题是确保正确调整存储过程。 Your C# code is about as fast as you are going to get it. 您的C#代码与您获得它的速度一样快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM