简体   繁体   中英

ASP.NET MVC4 - ADO.NET - Saving a large number of files from ZipArchives to sql server

I have a set of incoming zip file(s) that can be upto 2GB in total and will contain thousands of files. (files include jpg, pdf, txt, doc, etc)

Every file will be saved as a separate row in a SQL Server 2014 database table using a stored procedure that takes a Table valued parameter and gets called via ADO.NET. The table uses varchar for filename and varbinary(max) for the file itself.

Previously, the incoming zip file was extracted in memory and the contents were stored in an Dictionary<T> and the whole set was saved with just one call to the DB, but this caused memory issues since the extracted collection can go over 2GB, so the dictionary object was getting larger than max size of a CLR Object.(2GB) I'm aware that this can be overriden in .NET 4.5.1, but I don't want to take that option at the moment.

To fix this out of memory issue, I'm passing the Files directly into my Data access class and doing something like below. Basically, creating smaller batches of upto 500MB and committing it to SQL Server. So, the size of managed object (datatable) in Large object heap cannot exceed 500MB. The files that don't belong to the current batch are still kept in unmanaged memory.

But, I think the data is getting disposed even before the transaction is completed, so it fails silently without throwing any exception. However, it works well when I significantly reduce the size of the batch (like 2MB or so).

How I can work around this issue? I would ideally want the batch size to be 500MB, as the size of an individual file can go upto 250MB.

 Using System.IO.Compression;

 public SaveFiles(int userId, HttpFileCollectionBase files)
    {
        try
        {
        const long maxBatchSize = 524288000; //500MB
        var myCollection = namesOfValidFilesBasedOnBusinessLogic;

        var dataTable = new DataTable("@Files");
        dataTable.Columns.Add("FileName", typeof(string));
        dataTable.Columns.Add("File", typeof(byte[]));

        for (var i = 0; i < files.Count; i++)
        {
            using (var zipFile = new ZipArchive(files[i].InputStream))
            {
                var validEntries = zipFile.Entries.Where(e => myCollection.Contains(e.name));
                long currentBatchSize = 0;

                foreach (var entry in validEntries)
                {
                    if (currentBatchSize < maxBatchSize)
                    {
                        currentBatchSize = currentBatchSize + entry.length;
                        using (var stream = entry.Open())
                        {
                            using (var ms = new MemoryStream())
                            {
                                stream.CopyTo(ms);
                                dataTable.Rows.Add(entry.Name, ms.ToArray());
                            }
                        }
                    }
                    else
                    {
                        using (var conn = new SqlConnection(connectionString))
                        {
                            conn.Open();
                            using (var cmd = new Sqlcommand("dbo.SaveFiles", conn))
                            {
                                cmd.CommandType = CommandType.StoredProcedure;
                                cmd.Parameters.AddWithValue("@UserId", userId);
                                cmd.Parameters.AddWithValue("@Files", dataTable);
                                cmd.CommandTimeout = 0;
                                cmd.ExecuteNonQuery(); //control just disappears after this line
                            }
                            dataTable = new DataTable("@Files");
                            dataTable.Columns.Add("FileName", typeof(string));
                            dataTable.Columns.Add("File", typeof(byte[]));
                        }
                    }
                }
            }
        }
    }
    catch (Exception ex)
    {
        throw ex; //Not getting any exception 
    }
}

//control just disappears after this line

I am going to assume that you mean that the next line never executes.

When sending a large amount of data to Sql Server to be saved this is most likely what you are observing, it appears that nothing is happening when in fact this data has to be sent to the server and then processed and 500 MB could take a while for that to happen.

If you change your timeout on the command to something like 200 seconds I am willing to bet you will receive a SqlException after 200 seconds due to a time out. Because you have it set to 0 it will wait indefinitely.

cmd.CommandTimeout = 200;

If this is not desirable then you need to figure out good balance between time and batch size based on the amount of time it takes per XX MB. The only way you can measure that is by testing with various batch sizes as it is dependent on your environment (network capacity, sql server load, client load among other things).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM