[英]“Streaming” read of over 10 million rows from a table in SQL Server
What is the best strategy to read millions of records from a table (in SQL Server 2012, BI instance), in a streaming fashion (like SQL Server Management Studio does)? 以流方式(如SQL Server Management Studio)从表(在SQL Server 2012,BI实例中)读取数百万条记录的最佳策略是什么?
I need to cache these records locally (C# console application) for further processing. 我需要在本地缓存这些记录(C#控制台应用程序)以进行进一步处理。
Update - Sample code that works with SqlDataReader 更新 - 与SqlDataReader一起使用的示例代码
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Data;
using System.Data.SqlClient;
using System.Threading;
namespace ReadMillionsOfRows
{
class Program
{
static ManualResetEvent done = new ManualResetEvent(false);
static void Main(string[] args)
{
Process();
done.WaitOne();
}
public static async Task Process()
{
string connString = @"Server=;Database=;User Id=;Password=;Asynchronous Processing=True";
string sql = "Select * from tab_abc";
using (SqlConnection conn = new SqlConnection(connString))
{
await conn.OpenAsync();
using (SqlCommand comm = new SqlCommand(sql))
{
comm.Connection = conn;
comm.CommandType = CommandType.Text;
using (SqlDataReader reader = await comm.ExecuteReaderAsync())
{
while (await reader.ReadAsync())
{
//process it here
}
}
}
}
done.Set();
}
}
}
Use a SqlDataReader it is forward only and fast. 使用SqlDataReader它只是前进和快速。 It will only hold a reference to a record while it is in the scope of reading it. 它只会在读取记录范围时保留对记录的引用。
That depends on what your cache looks like. 这取决于缓存的外观。 If you're going to store everything in memory, and a DataSet is approriate as a cache, just read everything to the DataSet. 如果您要将所有内容存储在内存中,并且DataSet适合作为缓存,则只需读取DataSet中的所有内容即可。
If not, use the SqlDataReader
as suggested above, read the records one by one storing them in your big cache. 如果没有,请按照上面的建议使用SqlDataReader
,逐个读取记录,将它们存储在大缓存中。
Do note, however, that there's already a very popular caching mechanism for large database tables - your database. 但请注意,对于大型数据库表 - 您的数据库,已经有一种非常流行的缓存机制。 With the proper index configuration, the database can probably outperform your cache. 使用正确的索引配置,数据库可能会胜过您的缓存。
You can use Entity Framework and paginate the select using Take
and Skip
to fetch the rows by buffer. 您可以使用Entity Framework并使用Take
和Skip
对select进行分页, Take
通过缓冲区获取行。 If you need in memory caching for such a large dataset I would suggest using GC.GetTotalMemory
in order to test if there is any free memory left. 如果你需要内存缓存这么大的数据集我建议使用GC.GetTotalMemory
来测试是否还有剩余空闲内存。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.