简体   繁体   English

在运行大量插入的同时选择百万条以上的记录

[英]Select million+ records while huge insert is running

I am trying to extract application log file from a single table. 我正在尝试从单个表中提取应用程序日志文件。 The select query statement is pretty straightforward. 选择查询语句非常简单。

select top 200000 * 
from dbo.transactionlog 
where rowid>7 
and rowid <700000 and 
Project='AmWINS' 

The query time for above select is above 5 mins. 以上选择的查询时间超过5分钟。 Is it considered long? 它长吗? While the select is running, the bulk insertion is also running. 在运行select的同时,批量插入也在运行。

[EDIT] [编辑]

Actually, I am having serious problem on my current Production logging database, Basically, we only have one table (transactionlog). 实际上,我当前的生产日志数据库存在严重问题,基本上,我们只有一个表(transactionlog)。 all the application log will be insert into this table. 所有应用程序日志将插入到此表中。 For Project like AmWINS, base on select count result, we have about 800K++ records inserted per day. 对于像AmWINS这样的项目,基于选择计数结果,我们每天大约插入800K ++条记录。 The insertion of record are running 24 hours daily in Production environment. 记录的插入每天在生产环境中运行24小时。 User would like to extract data from the table if user want to check the transaction logs. 如果用户要检查事务日志,则希望从表中提取数据。 Therefore, we need to select the records out from the table if necessary. 因此,如果需要,我们需要从表中选择记录。

I tried to simulate on UAT enviroment to pump in the volumn as per Production which already grow up to 10millions records until today. 我试图模拟UAT环境,以根据Production抽取大量数据,到今天,该产量已经增长到1000万条记录。 and while i try to extract records, at the same time, I simulate with a bulk insertion to make it look like as per production environment. 当我尝试提取记录时,同时,我使用批量插入进行模拟,使其看起来像在生产环境中一样。 It took like 5 mins just to extract 200k records. 仅花费了5分钟即可提取20万条记录。

During the extraction running, I monitor on the SQL phyiscal server CPU is spike up to 95%. 在提取运行期间,我监视SQL phyiscal服务器上的CPU峰值达到95%。

the tables have 13 fields and a identity turn on(rowid) with bigint. 该表具有13个字段,并使用bigint标识打开(行)。 rowid is the PK. rowid是PK。 Indexes are create on Date, Project, module and RefNumber. 在日期,项目,模块和RefNumber上创建索引。 the tables are created on rowlock and pagelock enabled. 在行锁和启用页面锁的情况下创建表。 I am using SQL server 2005. 我正在使用SQL Server 2005。

Hope you guys can give me some professional advices to enlighten me. 希望你们能给我一些专业的建议,以启发我。 Thanks. 谢谢。

It may be possible for you to use the "Nolock" table hint, as described here: 您可能可以使用“ Nolock”表提示,如下所述:

Table Hints MSDN 表提示MSDN

Your SQL would become something like this: 您的SQL将变成这样:

select top 200000 * from dbo.transactionlog with (no lock) ...

This would achieve better performance if you aren't concerned about the complete accuracy of the data returned. 如果您不关心返回的数据的完整准确性,则可以实现更好的性能。

What are you doing with the 200,000 rows? 您要处理200,000行吗? Are you running this over a network? 您正在通过网络运行此程序吗? Depending on the width of your table, just getting that amount of data across the network could be the bulk of the time spent. 根据表的宽度,仅花费整个网络上的数据量可能是大部分时间。

您也可以将其导出为本地dat或sql文件。

It depends on your hardware. 这取决于您的硬件。 Pulling 200000 rows out while there is data being inserted requires some serious IO, so unless you have a 30+disk system, it will be slow. 在插入数据的同时拉出200000行需要一些严肃的IO,因此除非您拥有30多个磁盘系统,否则它将很慢。

Also, is your rowID column indexed? 另外,您的rowID列是否已索引? This will help with the select, but could slow down the bulk insert. 这将有助于选择,但可能会降低批量插入的速度。

我不确定,但是不批量插入MS SQL会锁定整个表吗?

As ck already said. 正如ck所说。 Indexing is important. 索引很重要。 So make sure you have an appropriate index ready. 因此,请确保您已准备好适当的索引。 I would not only set an index on rowId but also on Project. 我不仅会在rowId上设置索引,还会在Project上设置索引。 Also I would rewrite the where-clause to: 我还要将条款重写为:

WHERE Project = 'AmWINS' AND rowid BETWEEN 8 AND 699999

Reason: I guess Project is more restrictive than rowid and - correct me, if I'm wrong - BETWEEN is faster than a < and > comparison. 原因:我猜想Project比rowid更具限制性,并且-如果我错了,请纠正我-BETWEEN比<和>比较要快。

No amount of indexing will help here because it's a SELECT * query so it's most likely a PK scan or an horrendous bookup lookup 没有索引的数量会在这里有所帮助,因为它是一个SELECT *查询,因此很可能是PK扫描或可怕的预订查询

And the TOP is meaningless because there is no ORDER BY. 而TOP是没有意义的,因为没有ORDER BY。

The simultaneous insert is probably misleading as far as I can tell, unless the table only has 2 columns and the bulk insert is locking the whole table. 据我所知,同时插入可能会引起误解,除非该表只有2列,并且大容量插入会锁定整个表。 With a simple int IDENTITY column the insert and select may not interfere with each other too. 使用简单的int IDENTITY列,插入和选择也可能不会互相干扰。

Especially if the bulk insert is only a few 1000s of rows (or even 10,000s) 特别是如果批量插入只有几千行(甚至10,000行)

Edit. 编辑。 The TOP and rowid values do not imply a million plus TOP和rowid值并不表示一百万以上

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM