简体   繁体   中英

How to speed up LINQ inserts with SQL CE?

History

I have a list of "records" (3,500) which I save to XML and compress on exit of the program. Since:

  • the number of the records increases
  • only around 50 records need to be updated on exit
  • saving takes about 3 seconds

I needed another solution -- embedded database. I chose SQL CE because it works with VS without any problems and the license is OK for me (I compared it to Firebird , SQLite , EffiProz , db4o and BerkeleyDB ).

The data

The record structure: 11 fields, 2 of them make primary key (nvarchar + byte). Other records are bytes, datatimes, double and ints.

I don't use any relations, joins, indices (except for primary key), triggers, views, and so on. It is flat Dictionary actually -- pairs of Key+Value. I modify some of them, and then I have to update them in database. From time to time I add some new "records" and I need to store (insert) them. That's all.

LINQ approach

I have blank database (file), so I make 3500 inserts in a loop (one by one). I don't even check if the record already exists because db is blank.

Execution time? 4 minutes, 52 seconds. I fainted (mind you: XML + compress = 3 seconds).

SQL CE raw approach

I googled a bit, and despite such claims as here: LINQ to SQL (CE) speed versus SqlCe stating it is SQL CE itself fault I gave it a try.

The same loop but this time inserts are made with SqlCeResultSet (DirectTable mode, see: Bulk Insert In SQL Server CE ) and SqlCeUpdatableRecord.

The outcome? Do you sit comfortably? Well... 0.3 second (yes, fraction of the second.).

The problem

LINQ is very readable, and raw operations are quite contrary. I could write a mapper which translates all column indexes to meaningful names, but it seems like reinventing the wheel -- after all it is already done in... LINQ.

So maybe it is a way to tell LINQ to speed things up? QUESTION -- how to do it?

The code

LINQ

foreach (var entry in dict.Entries.Where(it => it.AlteredByLearning))
{
    PrimLibrary.Database.Progress record = null;

        record = new PrimLibrary.Database.Progress();
        record.Text = entry.Text;
        record.Direction = (byte)entry.dir;
        db.Progress.InsertOnSubmit(record);

    record.Status = (byte)entry.LastLearningInfo.status.Value;
    // ... and so on

    db.SubmitChanges();
}

Raw operations

SqlCeCommand cmd = conn.CreateCommand();

cmd.CommandText = "Progress"; cmd.CommandType = System.Data.CommandType.TableDirect; SqlCeResultSet rs = cmd.ExecuteResultSet(ResultSetOptions.Updatable);

foreach (var entry in dict.Entries.Where(it => it.AlteredByLearning))
{
    SqlCeUpdatableRecord record = null;

    record = rs.CreateRecord();

    int col = 0;
    record.SetString(col++, entry.Text);
    record.SetByte(col++,(byte)entry.dir);
    record.SetByte(col++,(byte)entry.LastLearningInfo.status.Value);
    // ... and so on

    rs.Insert(record);
}

Do more work per transaction.

Commits are generally very expensive operations for a typical relational database as the database must wait for disk flushes to ensure data is not lost ( ACID guarantees and all that). Conventional HDD disk IO without specialty controllers is very slow in this sort of operation: the data must be flushed to the physical disk -- perhaps only 30-60 commits can occur a second with an IO sync between!

See the SQLite FAQ: INSERT is really slow - I can only do few dozen INSERTs per second . Ignoring the different database engine, this is the exact same issue.

Normally, LINQ2SQL creates a new implicit transaction inside SubmitChanges . To avoid this implicit transaction/commit ( commits are expensive operations ) either:

  1. Call SubmitChanges less (say, once outside the loop) or;

  2. Setup an explicit transaction scope (see TransactionScope ).

One example of using a larger transaction context is:

using (var ts = new TransactionScope()) {
  // LINQ2SQL will automatically enlist in the transaction scope.
  // SubmitChanges now will NOT create a new transaction/commit each time.
  DoImportStuffThatRunsWithinASingleTransaction();
  // Important: Make sure to COMMIT the transaction.
  // (The transaction used for SubmitChanges is committed to the DB.)
  // This is when the disk sync actually has to happen,
  // but it only happens once, not 3500 times!
  ts.Complete();
}

However, the semantics of an approach using a single transaction or a single call to SubmitChanges are different than that of the code above calling SubmitChanges 3500 times and creating 3500 different implicit transactions. In particular, the size of the atomic operations (with respect to the database) is different and may not be suitable for all tasks.

For LINQ2SQL updates, changing the optimistic concurrency model (disabling it or just using a timestamp field, for instance) may result in small performance improvements. The biggest improvement, however, will come from reducing the number of commits that must be performed.

Happy coding.

i'm not positive on this, but it seems like the db.SubmitChanges() call should be made outside of the loop. maybe that would speed things up?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM