简体   繁体   English

有效地迭代和更新数据库中的大量数据

[英]Efficiently iterating and updating large amounts of data from a database

I have a table in SQL Server that is storing files in binary format. 我在SQL Server中有一个表以二进制格式存储文件。 Each row is on average ~3MB and there are tens of thousands of rows. 每行平均约3MB,并且有数万行。 What I'd like to do (since I must keep these tables around), is query each row, then run some compression on the binary data, and then re-insert the data (by updating each row). 我想做的事情(因为我必须保留这些表)是查询每一行,然后对二进制数据进行一些压缩,然后重新插入数据(通过更新每一行)。

My current naive implementation simply does something similar to this (using Dapper ): 我当前的天真的实现只是做类似的事情(使用Dapper ):

var files = con.QueryAsync<MyClass>("SELECT ID, Content from Files");

foreach (var file in files)
{
    ... compress file.Content here
    con.ExecuteAsync("UPDATE Files SET Content = @NewContent WHERE ID = @ID", { ... });
}

Obviously this is very inefficient because it first loads all files into memory, etc... I was hoping can somehow do a query/update in "batches", and IDEALLY I'd like to be able to run each batch asynchronously (if that's even possible). 显然这是非常低效的,因为它首先将所有文件加载到内存中,等等...我希望可以以某种方式在“批处理”中执行查询/更新,并且非常希望我能够异步运行每个批处理(如果甚至可能)。

Any suggestions would be appreciated (using SQL Server BTW). 任何建议,将不胜感激(使用SQL Server BTW)。

Entire operation could be done on db instance, without moving data over network to application and back, using built-in function COMPRESS : 整个操作可以在数据库实例上完成,而无需使用内置函数COMPRESS通过网络将数据移动到应用程序和来回移动:

This function compresses the input expression, using the GZIP algorithm. 此函数使用GZIP算法压缩输入表达式。 The function returns a byte array of type varbinary(max). 该函数返回varbinary(max)类型的字节数组。

UPDATE Files 
SET Content = COMPRESS(Content)
WHERE ID IN (range); -- for example 1k rows per batch

If you are using SQL Server version lower than 2016 or you need "custom" compression algorithm you could use user-defined CLR function . 如果您使用的SQL Server版本低于2016,或者需要“自定义”压缩算法,则可以使用用户定义的CLR函数

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM