简体   繁体   English

ASP.NET MVC4 - ADO.NET - 将大量文件从 ZipArchives 保存到 sql server

[英]ASP.NET MVC4 - ADO.NET - Saving a large number of files from ZipArchives to sql server

I have a set of incoming zip file(s) that can be upto 2GB in total and will contain thousands of files.我有一组传入的 zip 文件,它们总共可以达到 2GB,并且将包含数千个文件。 (files include jpg, pdf, txt, doc, etc) (文件包括 jpg、pdf、txt、doc 等)

Every file will be saved as a separate row in a SQL Server 2014 database table using a stored procedure that takes a Table valued parameter and gets called via ADO.NET.每个文件都将使用存储过程保存为 SQL Server 2014 数据库表中的单独行,该过程采用表值参数并通过 ADO.NET 调用。 The table uses varchar for filename and varbinary(max) for the file itself.该表使用 varchar 作为文件名,使用 varbinary(max) 作为文件本身。

Previously, the incoming zip file was extracted in memory and the contents were stored in an Dictionary<T> and the whole set was saved with just one call to the DB, but this caused memory issues since the extracted collection can go over 2GB, so the dictionary object was getting larger than max size of a CLR Object.(2GB) I'm aware that this can be overriden in .NET 4.5.1, but I don't want to take that option at the moment.以前,传入的 zip 文件在内存中提取,内容存储在Dictionary<T> ,只需调用一次数据库即可保存整个集合,但这会导致内存问题,因为提取的集合可能超过 2GB,因此字典对象变得大于CLR 对象的最大大小。(2GB)我知道这可以在 .NET 4.5.1 中被覆盖,但我现在不想采取那个选项。

To fix this out of memory issue, I'm passing the Files directly into my Data access class and doing something like below.为了解决这个内存不足问题,我将文件直接传递到我的数据访问类并执行如下操作。 Basically, creating smaller batches of upto 500MB and committing it to SQL Server.基本上,创建最多 500MB 的小批量并将其提交到 SQL Server。 So, the size of managed object (datatable) in Large object heap cannot exceed 500MB.因此,Large object heap 中托管对象(datatable)的大小不能超过 500MB。 The files that don't belong to the current batch are still kept in unmanaged memory.不属于当前批次的文件仍保存在非托管内存中。

But, I think the data is getting disposed even before the transaction is completed, so it fails silently without throwing any exception.但是,我认为数据甚至在事务完成之前就被处理掉了,所以它会默默地失败而不会抛出任何异常。 However, it works well when I significantly reduce the size of the batch (like 2MB or so).但是,当我显着减小批处理的大小(例如 2MB 左右)时,它运行良好。

How I can work around this issue?我该如何解决这个问题? I would ideally want the batch size to be 500MB, as the size of an individual file can go upto 250MB.理想情况下,我希望批处理大小为 500MB,因为单个文件的大小可以达到 250MB。

 Using System.IO.Compression;

 public SaveFiles(int userId, HttpFileCollectionBase files)
    {
        try
        {
        const long maxBatchSize = 524288000; //500MB
        var myCollection = namesOfValidFilesBasedOnBusinessLogic;

        var dataTable = new DataTable("@Files");
        dataTable.Columns.Add("FileName", typeof(string));
        dataTable.Columns.Add("File", typeof(byte[]));

        for (var i = 0; i < files.Count; i++)
        {
            using (var zipFile = new ZipArchive(files[i].InputStream))
            {
                var validEntries = zipFile.Entries.Where(e => myCollection.Contains(e.name));
                long currentBatchSize = 0;

                foreach (var entry in validEntries)
                {
                    if (currentBatchSize < maxBatchSize)
                    {
                        currentBatchSize = currentBatchSize + entry.length;
                        using (var stream = entry.Open())
                        {
                            using (var ms = new MemoryStream())
                            {
                                stream.CopyTo(ms);
                                dataTable.Rows.Add(entry.Name, ms.ToArray());
                            }
                        }
                    }
                    else
                    {
                        using (var conn = new SqlConnection(connectionString))
                        {
                            conn.Open();
                            using (var cmd = new Sqlcommand("dbo.SaveFiles", conn))
                            {
                                cmd.CommandType = CommandType.StoredProcedure;
                                cmd.Parameters.AddWithValue("@UserId", userId);
                                cmd.Parameters.AddWithValue("@Files", dataTable);
                                cmd.CommandTimeout = 0;
                                cmd.ExecuteNonQuery(); //control just disappears after this line
                            }
                            dataTable = new DataTable("@Files");
                            dataTable.Columns.Add("FileName", typeof(string));
                            dataTable.Columns.Add("File", typeof(byte[]));
                        }
                    }
                }
            }
        }
    }
    catch (Exception ex)
    {
        throw ex; //Not getting any exception 
    }
}

//control just disappears after this line //控制在这一行之后消失

I am going to assume that you mean that the next line never executes.我将假设您的意思是下一行永远不会执行。

When sending a large amount of data to Sql Server to be saved this is most likely what you are observing, it appears that nothing is happening when in fact this data has to be sent to the server and then processed and 500 MB could take a while for that to happen.将大量数据发送到 Sql Server 以进行保存时,这很可能是您所观察到的,实际上这些数据必须发送到服务器然后进行处理,并且500 MB可能需要一段时间,因此似乎什么也没有发生发生这种情况。

If you change your timeout on the command to something like 200 seconds I am willing to bet you will receive a SqlException after 200 seconds due to a time out.如果您将命令的超时更改为 200 秒,我敢打赌您将在 200 秒后由于超时而收到SqlException Because you have it set to 0 it will wait indefinitely.因为您将它设置为0 ,它将无限期地等待。

cmd.CommandTimeout = 200;

If this is not desirable then you need to figure out good balance between time and batch size based on the amount of time it takes per XX MB.如果这是不可取的,那么您需要根据每 XX MB 花费的时间在时间​​和批量大小之间找到良好的平衡。 The only way you can measure that is by testing with various batch sizes as it is dependent on your environment (network capacity, sql server load, client load among other things).您可以衡量的唯一方法是使用各种批量大小进行测试,因为它取决于您的环境(网络容量、sql server 负载、客户端负载等)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM