简体   繁体   中英

Select million+ records while huge insert is running

I am trying to extract application log file from a single table. The select query statement is pretty straightforward.

select top 200000 * 
from dbo.transactionlog 
where rowid>7 
and rowid <700000 and 
Project='AmWINS' 

The query time for above select is above 5 mins. Is it considered long? While the select is running, the bulk insertion is also running.

[EDIT]

Actually, I am having serious problem on my current Production logging database, Basically, we only have one table (transactionlog). all the application log will be insert into this table. For Project like AmWINS, base on select count result, we have about 800K++ records inserted per day. The insertion of record are running 24 hours daily in Production environment. User would like to extract data from the table if user want to check the transaction logs. Therefore, we need to select the records out from the table if necessary.

I tried to simulate on UAT enviroment to pump in the volumn as per Production which already grow up to 10millions records until today. and while i try to extract records, at the same time, I simulate with a bulk insertion to make it look like as per production environment. It took like 5 mins just to extract 200k records.

During the extraction running, I monitor on the SQL phyiscal server CPU is spike up to 95%.

the tables have 13 fields and a identity turn on(rowid) with bigint. rowid is the PK. Indexes are create on Date, Project, module and RefNumber. the tables are created on rowlock and pagelock enabled. I am using SQL server 2005.

Hope you guys can give me some professional advices to enlighten me. Thanks.

It may be possible for you to use the "Nolock" table hint, as described here:

Table Hints MSDN

Your SQL would become something like this:

select top 200000 * from dbo.transactionlog with (no lock) ...

This would achieve better performance if you aren't concerned about the complete accuracy of the data returned.

What are you doing with the 200,000 rows? Are you running this over a network? Depending on the width of your table, just getting that amount of data across the network could be the bulk of the time spent.

您也可以将其导出为本地dat或sql文件。

It depends on your hardware. Pulling 200000 rows out while there is data being inserted requires some serious IO, so unless you have a 30+disk system, it will be slow.

Also, is your rowID column indexed? This will help with the select, but could slow down the bulk insert.

我不确定,但是不批量插入MS SQL会锁定整个表吗?

As ck already said. Indexing is important. So make sure you have an appropriate index ready. I would not only set an index on rowId but also on Project. Also I would rewrite the where-clause to:

WHERE Project = 'AmWINS' AND rowid BETWEEN 8 AND 699999

Reason: I guess Project is more restrictive than rowid and - correct me, if I'm wrong - BETWEEN is faster than a < and > comparison.

No amount of indexing will help here because it's a SELECT * query so it's most likely a PK scan or an horrendous bookup lookup

And the TOP is meaningless because there is no ORDER BY.

The simultaneous insert is probably misleading as far as I can tell, unless the table only has 2 columns and the bulk insert is locking the whole table. With a simple int IDENTITY column the insert and select may not interfere with each other too.

Especially if the bulk insert is only a few 1000s of rows (or even 10,000s)

Edit. The TOP and rowid values do not imply a million plus

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM