简体   繁体   English

使用C#无法从SQL Server中选择超过700000行

[英]Can't select more than 700000 rows from SQL Server using C#

I couldn't fetch more than 700000 rows from SQL Server using C# - I get a "out-of-memory" exception. 我无法使用C#从SQL Server获取超过700000行 - 我得到一个“内存不足”的异常。 Please help me out. 请帮帮我。

This is my code: 这是我的代码:

using (SqlConnection sourceConnection = new SqlConnection(constr))
{
    sourceConnection.Open();

    SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);

    reader = commandSourceData.ExecuteReader();
}

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
{
    bulkCopy.DestinationTableName = "destinationTable";

    try
    {
        // Write from the source to the destination.
        bulkCopy.WriteToServer(reader);
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
    }
    finally
    {
        reader.Close();
    }
}

I have made up small console App based on the given solution 1 but ends up with same exception also i have posted my Memory process Before and After Before Processing : 我根据给定的解决方案1组成了小型控制台应用程序,但最终也遇到了相同的异常,我在处理之前和之后发布了我的内存进程: 在此输入图像描述

处理后

After adding the command timeout at the read code side, Ram Peaks up, 在读取代码端添加命令超时后,Ram Peaks up, 在此输入图像描述

That code should not cause an OOM exception. 该代码不应导致OOM异常。 When you pass a DataReader to SqlBulkCopy.WriteToServer you are streaming the rows from the source to the destination. 将DataReader传递给SqlBulkCopy.WriteToServer时,您将行从源传输到目标。 Somewhere else you are retaining stuff in memory. 其他地方你保留了记忆中的东西。

SqlBulkCopy.BatchSize controls how often SQL Server commits the rows loaded at the destination, limiting the lock duration and the log file growth (if not minimally logged and in simple recovery mode). SqlBulkCopy.BatchSize控制SQL Server提交在目标上加载的行的频率,限制锁定持续时间和日志文件增长(如果没有最低限度记录并处于简单恢复模式)。 Whether you use one batch or not should have no impact on the amount of memory used either in SQL Server or in the client. 是否使用一个批处理对SQL Server或客户端中使用的内存量没有影响。

Here's a sample that copies 10M rows without growing memory: 这是一个复制10M行而不增加内存的示例:

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace SqlBulkCopyTest
{
    class Program
    {
        static void Main(string[] args)
        {
            var src = "server=localhost;database=tempdb;integrated security=true";
            var dest = src;

            var sql = "select top (1000*1000*10) m.* from sys.messages m, sys.messages m2";

            var destTable = "dest";

            using (var con = new SqlConnection(dest))
            {
                con.Open();
                var cmd = con.CreateCommand();
                cmd.CommandText = $"drop table if exists {destTable}; with q as ({sql}) select * into {destTable} from q where 1=2";
                cmd.ExecuteNonQuery();
            }

            Copy(src, dest, sql, destTable);
            Console.WriteLine("Complete.  Hit any key to exit.");
            Console.ReadKey();
        }

        static void Copy(string sourceConnectionString, string destinationConnectionString, string query, string destinationTable)
        {
            using (SqlConnection sourceConnection = new SqlConnection(sourceConnectionString))
            {
                sourceConnection.Open();

                SqlCommand commandSourceData = new SqlCommand(query, sourceConnection);

                var reader = commandSourceData.ExecuteReader();

                using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
                {
                    bulkCopy.BulkCopyTimeout = 60 * 10;
                    bulkCopy.DestinationTableName = destinationTable;
                    bulkCopy.NotifyAfter = 10000;
                    bulkCopy.SqlRowsCopied += (s, a) =>
                    {
                        var mem = GC.GetTotalMemory(false);
                        Console.WriteLine($"{a.RowsCopied:N0} rows copied.  Memory {mem:N0}");
                    };
                     // Write from the source to the destination.
                     bulkCopy.WriteToServer(reader);

                }
            }
        }


    }
}

Which outputs: 哪个输出:

. . .
9,830,000 rows copied.  Memory 1,756,828
9,840,000 rows copied.  Memory 798,364
9,850,000 rows copied.  Memory 4,042,396
9,860,000 rows copied.  Memory 3,092,124
9,870,000 rows copied.  Memory 2,133,660
9,880,000 rows copied.  Memory 1,183,388
9,890,000 rows copied.  Memory 3,673,756
9,900,000 rows copied.  Memory 1,601,044
9,910,000 rows copied.  Memory 3,722,772
9,920,000 rows copied.  Memory 1,642,052
9,930,000 rows copied.  Memory 3,763,780
9,940,000 rows copied.  Memory 1,691,204
9,950,000 rows copied.  Memory 3,812,932
9,960,000 rows copied.  Memory 1,740,356
9,970,000 rows copied.  Memory 3,862,084
9,980,000 rows copied.  Memory 1,789,508
9,990,000 rows copied.  Memory 3,903,044
10,000,000 rows copied.  Memory 1,830,468
Complete.  Hit any key to exit.

Something went horribly wrong in your design if you even try to process 700k Rows in C#. 如果您甚至尝试在C#中处理700k行,那么您的设计中出现了可怕的错误。 That you fail at this is to be expected. 你在这方面失败是可以预料的。

If this is data retrieval for display: There is no way the user will be able to process that amount of data. 如果这是用于显示的数据检索:用户无法处理该数量的数据。 And filtering down from 700k Rows in the GUI is just a waste of time and Bandwidth. 从GUI中的700k行向下过滤只是浪费时间和带宽。 25-100 fields at once is about the limit. 一次25-100个字段即将达到极限。 Do filtering or pagination on the Query side so you do not end up retrieving orders of magnitude more then you can actually process. 在查询端进行过滤或分页,这样您就不会最终检索到实际处理的数量级。

If this is some form of Bulk insert or Bulk modification: Do that kind of operation in the SQL Server, not in your code. 如果这是某种形式的批量插入或批量修改:在SQL Server中执行此类操作,而不是在代码中执行。 Retrieving, processing in C# and then posting back just adds layers of Overhead. 在C#中检索,处理然后回发只会增加Overhead层。 If you add the 2 way Network transfer, you will easily triple the time this will take. 如果您添加双向网络传输,您将轻松地将时间增加三倍。

NB: Per DavidBrowne's answer , it seems I'd misunderstood how the batching of the SqlBulkCopy class works. 注意:根据DavidBrowne的回答 ,似乎我误解了SqlBulkCopy类的批处理是如何工作的。 The refactored code may still be useful to you, so I've not deleted this answer (as the code is still valid), but the answer is not to set the BatchSize as I'd believed. 重构的代码可能仍然对您有用,所以我没有删除这个答案(因为代码仍然有效),但答案不是设置我认为的BatchSize。 Please see David's answer for an explanation. 请参阅David的答案以获得解释。


Try something like this; 尝试这样的事情; the key being setting the BatchSize property to limit how many rows you deal with at once: 关键是设置BatchSize属性以限制您一次处理的行数:

using (SqlConnection sourceConnection = new SqlConnection(constr))
{
    sourceConnection.Open();
    SqlCommand commandSourceData = new SqlCommand("select * from XXXX ", sourceConnection);
    using (reader = commandSourceData.ExecuteReader() { //add a using statement for your reader so you don't need to worry about close/dispose

        //keep the connection open or we'll be trying to read from a closed connection

        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(constr2))
        {
            bulkCopy.BatchSize = 1000; //Write a few pages at a time rather than all at once; thus lowering memory impact.  See https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy.batchsize?view=netframework-4.7.2
            bulkCopy.DestinationTableName = "destinationTable";

            try
            {
                // Write from the source to the destination.
                bulkCopy.WriteToServer(reader);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
                throw; //we've caught the top level Exception rather than somethign specific; so once we've logged it, rethrow it for a proper handler to deal with up the call stack
            }
        }
    }

}

Note that because the SqlBulkCopy class takes an IDataReader as an argument we don't need to download the full data set. 请注意,因为SqlBulkCopy类将IDataReader作为参数,所以我们不需要下载完整的数据集。 Instead, the reader gives us a way to pull back records as required (hence us leaving the connection open after creating the reader). 相反,读者为我们提供了一种根据需要撤回记录的方法(因此我们在创建阅读器后将连接保持打开状态)。 When we call the SqlBulkCopy 's WriteToServer method, internally it has logic to loop multiple times, selecting BatchSize new records from the reader, then pushing those to the destination table before repeating / completing once the reader has sent all pending records. 当我们调用SqlBulkCopyWriteToServer方法时,在内部它有多次循环的逻辑,从阅读器中选择BatchSize新记录,然后在读者发送所有待处理记录之后重复/完成之前将这些记录推送到目标表。 This works differently to, say, a DataTable , where we'd have to populate the data table with the full set of records, rather than being able to read more back as required. 这与DataTable工作方式不同,我们必须使用完整的记录集填充数据表,而不是根据需要读取更多数据。

One potential risk of this approach is, because we have to keep the connection open, any locks on our source are kept in place until we close our reader. 这种方法的一个潜在风险是,因为我们必须保持连接打开,我们源上的任何锁都会保留到位,直到我们关闭我们的读者。 Depending on the isolation level and whether other queries are trying to access the same records, this may cause blocking; 根据隔离级别以及其他查询是否尝试访问相同的记录,这可能会导致阻塞; whilst the data table approach would have taken a one-off copy of the data into memory and then closed the connection, avoiding any blocks. 而数据表方法会将数据的一次性副本复制到内存中,然后关闭连接,避免任何阻塞。 If this blocking is a concern you should look at changing the isolation level of your query, or applying hints... Exactly how you approach that would depend on the requirements though. 如果这个阻塞是一个问题,你应该考虑改变你的查询的隔离级别,或应用提示......你究竟如何接近这将取决于要求。

NB: In reality, instead of running the above code as is, you'd want to refactor things a bit, so the scope of each method is contained. 注意:实际上,不是按原样运行上面的代码,而是想稍微重构一下,所以每个方法的范围都包含在内。 That way you can reuse this logic to copy other queries to other tables. 这样,您可以重用此逻辑将其他查询复制到其他表。 You'd also want to make the batch size configurable rather than hard-coded so you can adjust to a value that gives a good balance of resource usage vs performance (which will vary based on the host's resources). 您还希望使批量大小可配置而不是硬编码,以便您可以调整为一个值,以便在资源使用与性能之间取得良好平衡(这将根据主机的资源而有所不同)。
You may also want to use async methods, to allow other parts of your program to progress whilst you're waiting on data to flow from/to your databases. 您可能还希望使用async方法,以便在等待数据从/向数据库流出时允许程序的其他部分进行。

Here's a slightly amended version: 这是一个略微修改的版本:

public Task<SqlDataReader> async ExecuteReaderAsync(string connectionString, string query) 
{
    SqlConnection connection;
    SqlCommand command; 
    try 
    {
        connection = new SqlConnection(connectionString); //not in a using as we want to keep the connection open until our reader's finished with it.
        connection.Open();
        command = new SqlCommand(query, connection);
        return await command.ExecuteReaderAsync(CommandBehavior.CloseConnection);  //tell our reader to close the connection when done.
    } 
    catch 
    {
        //if we have an issue before we've returned our reader, dispose of our objects here
        command?.Dispose();
        connection?.Dispose();
        //then rethrow the exception
        throw;
    }
}
public async Task CopySqlDataAsync(string sourceConnectionString, string sourceQuery, string destinationConnectionString, string destinationTableName, int batchSize)
{
    using (var reader = await ExecuteReaderAsync(sourceConnectionString, sourceQuery))
        await CopySqlDataAsync(reader, destinationConnectionString, destinationTableName, batchSize);
}
public async Task CopySqlDataAsync(IDataReader sourceReader, string destinationConnectionString, string destinationTableName, int batchSize)
{
    using (SqlBulkCopy bulkCopy = new SqlBulkCopy(destinationConnectionString))
    {
        bulkCopy.BatchSize = batchSize; 
        bulkCopy.DestinationTableName = destinationTableName;
        await bulkCopy.WriteToServerAsync(sourceReader);
    }
}
public void CopySqlDataExample()
{
    try 
    {
        var constr = ""; //todo: define connection string; ideally pulling from config 
        var constr2 = ""; //todo: define connection string #2; ideally pulling from config 
        var batchSize = 1000; //todo: replace hardcoded batch size with value from config
        var task = CopySqlDataAsync(constr, "select * from XXXX", constr2, "destinationTable", batchSize); 
        task.Wait(); //waits for the current task to complete / if any exceptions will throw an aggregate exception
    } 
    catch (AggregateException es)
    {
        var e = es.InnerExceptions[0]; //get the wrapped exception 
        Console.WriteLine(e.Message);
        //throw; //to rethrow AggregateException 
        ExceptionDispatchInfo.Capture(e).Throw(); //to rethrow the wrapped exception
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM