简体   繁体   中英

SQL Server, Converting NTEXT to NVARCHAR(MAX)

I have a database with a large number of fields that are currently NTEXT.

Having upgraded to SQL 2005 we have run some performance tests on converting these to NVARCHAR(MAX).

If you read this article:

http://geekswithblogs.net/johnsPerfBlog/archive/2008/04/16/ntext-vs-nvarcharmax-in-sql-2005.aspx

This explains that a simple ALTER COLUMN does not re-organise the data into rows.

I experience this with my data. We actually have much worse performance in some areas if we just run the ALTER COLUMN. However, if I run an UPDATE Table SET Column = Column for all of these fields we then get an extremely huge performance increase.

The problem I have is that the database consists of hundreds of these columns with millions of records. A simple test (on a low performance virtual machine) had a table with a single NTEXT column containing 7 million records took 5 hours to update.

Can anybody offer any suggestions as to how I can update the data in a more efficient way that minimises downtime and locks?

EDIT: My backup solution is to just update the data in blocks over time, however, with our data this results in worse performance until all the records have been updated and the shorter this time is the better so I'm still looking for a quicker way to update.

If you can't get scheduled downtime....

create two new columns: nvarchar(max) processedflag INT DEFAULT 0

Create a nonclustered index on the processedflag

You have UPDATE TOP available to you (you want to update top ordered by the primary key).

Simply set the processedflag to 1 during the update so that the next update will only update where the processed flag is still 0

You can use @@rowcount after the update to see if you can exit a loop.

I suggest using WAITFOR for a few seconds after each update query to give other queries a chance to acquire locks on the table and not to overload disk usage.

How about running the update in batches - update 1000 rows at a time.

You would use a while loop that increments a counter, corresponding to the ID of the rows to be updated in each iteration of the the update query. This may not speed up the amount of time it takes to update all 7 million records, but it should make it much less likely that users will experience an error due to record locking.

If you can get scheduled downtime:

  1. Back up the database
  2. Change recovery model to simple
  3. Remove all indexes from the table you are updating
  4. Add a column maintenanceflag(INT DEFAULT 0) with a nonclustered index
  5. Run: UPDATE TOP 1000 tablename SET nvarchar from ntext, maintenanceflag = 1 WHERE maintenanceflag = 0

Multiple times as required (within a loop with a delay).

Once complete, do another backup then change the recovery model back to what it was originally on and add old indexes.

Remember that every index or trigger on that table causes extra disk I/O and that the simple recovery mode minimises logfile I/O.

在低性能虚拟机上运行数据库测试并不能真正指示生产性能,所涉及的大量IO将需要一个快速磁盘阵列,虚拟化将对其进行限制。

You might also consider testing to see if an SSIS package might do this more efficiently.

Whatever you do, make it an automated process that can be scheduled and run during off hours. the feweer users you have trying to access the data, the faster everything will go. If at all possible, pickout the three or four most critical to change and take the database down for maintentance (during a normally off time) and do them in single user mode. Once you get the most critical ones, the others can be scheduled one or two a night.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM