[英]What is the Fastest Way to Select a Whole Table in SQL Server?
I am writing a app that reads a whole table, does some processing, then writes the resulting data to another table. 我正在编写一个应用程序,它读取整个表,进行一些处理,然后将结果数据写入另一个表。 I am using the SqlBulkCopy
class (.net version of "bcp in") which does the insert very fast. 我正在使用SqlBulkCopy
类(.net版本的“bcp in”),它可以非常快速地插入。 But I cannot find any efficent way to select data in the first place. 但我首先找不到任何有效的方法来选择数据。 there is not .net equivilent of "bcp out", which seems strange to me. 没有.net等效的“bcp out”,这对我来说似乎很奇怪。
Currently I'm using select * from table_name
. 目前我正在使用select * from table_name
。 For prespective it takes 2.5 seconds to select 6,000 rows ... and only 600ms to bulk insert the same number of rows. 对于预期,选择6,000行需要2.5秒......并且只有600毫秒来批量插入相同数量的行。
I would expect that selecting data should always be faster than inserting. 我希望选择数据总是比插入更快。 What is the fastest way to select all rows & columns from a table? 从表中选择所有行和列的最快方法是什么?
Answers to qeustions: qeustions的答案:
Here is my code: 这是我的代码:
DataTable staging = new DataTable();
using (SqlConnection dwConn = (SqlConnection)SqlConnectionManager.Instance.GetDefaultConnection())
{
dwConn.Open();
SqlCommand cmd = dwConn.CreateCommand();
cmd.CommandText = "select * from staging_table";
SqlDataReader reader = cmd.ExecuteReader();
staging.Load(reader);
}
select * from table_name
is the simplest, easiest and fastest way to read a whole table. select * from table_name
是读取整个表的最简单,最简单,最快捷的方法。
Let me explain why your results lead to wrong conclusions. 让我解释为什么你的结果导致错误的结论。
It all depends on your hardware, but it is likely that your network is the bottleneck here. 这一切都取决于您的硬件,但很可能您的网络是这里的瓶颈。
Apart from limiting your query to just read the columns you'd actually be using, doing a select is as fast as it will get. 除了限制您的查询只读取您实际使用的列之外,执行选择的速度与获取的速度一样快。 There is caching involved here, when you execute it twice in a row, the second time shoud be much faster because the data is cached in memory. 这里涉及缓存,当你连续两次执行它时,第二次会更快,因为数据被缓存在内存中。 execute dbcc dropcleanbuffers
to check the effect of caching. 执行dbcc dropcleanbuffers
以检查缓存的效果。
If you want to do it as fast as possible try to implement the code that does the processing in T-SQL, that way it could operate directly on the data right there on the server. 如果你想尽可能快地尝试实现在T-SQL中进行处理的代码,那么它可以直接在服务器上的数据上运行。
Another good tip for speed tuning is have the table that is being read on one disk (look at filegroups) and the table that is written to on another disk. 速度调整的另一个好方法是在一个磁盘上查找表(查看文件组)和在另一个磁盘上写入的表。 That way one disk can do a continuous read and the other a continuous write. 这样一个磁盘可以连续读取,另一个磁盘可以连续写入。 If both operations happen on the same disk the heads of the disk keep going back and forth what seriously downgrades performance. 如果两个操作都发生在同一个磁盘上,则磁盘的磁头会不断地来回转换,严重降低了性能。
If the logic your writing cannot be doen it T-SQL you could also have a look at SQL CLR. 如果您的编写逻辑不能用于T-SQL,您还可以查看SQL CLR。
Another tip: when you do select * from table, use a datareader if possible. 另一个提示:当您从表中选择*时,如果可能,请使用datareader。 That way you don't materialize the whole thing in memory first. 这样你就不会首先在内存中实现整个事物。
GJ GJ
It is a good idea generally to include the column names in the select list, but with today's RDBMS's, it won't make much difference. 通常将列名称包含在选择列表中是个好主意,但是对于今天的RDBMS,它不会有太大的区别。 You will only see difference in this regard if you limit the columns selected. 如果限制所选列,您将只看到这方面的差异。 Generally speaking it is good practice to include column names. 一般来说,最好包括列名。 But to answer it seems a select is indeed slower than inserting in the scenario you describe and yes a select * from table_name
is indeed the fastest way to read all rows and cols from a table 但要回答它似乎选择确实比插入您描述的场景慢,是的, select * from table_name
确实是从表中读取所有行和列的最快方法
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.