简体   繁体   English

通过ADO.Net和COM互操作性进行MS Access批量更新

[英]MS Access Batch Update via ADO.Net and COM Interoperability

This is kind of a follow-up to this thread . 这是这个帖子的后续行动。 This is all with .Net 2.0 ; 这都是.Net 2.0 ; for me, at least. 对我来说,至少。

Essentially, Marc (OP from above) tried several different approaches to update an MS Access table with 100,000 records and found that using a DAO connection was roughly 10 - 30x faster than using ADO.Net. 从本质上讲,Marc(上面的OP)尝试了几种不同的方法来更新具有100,000条记录的MS Access表,并发现使用DAO连接比使用ADO.Net 大约10-30倍 I went down virtually the same path (examples below) and came to the same conclusion. 我走了几乎相同的路径(下面的例子)并得出了相同的结论。

I guess I'm just trying to understand why OleDB and ODBC are so much slower and I'd love to hear if anyone has found a better answer than DAO since that post in 2011. I would really prefer to avoid DAO and/or Automation, since they're going to require the client machine to either have Access or the database engine redistributable (or I'm stuck with DAO 3.6 which doesn't support .ACCDB). 我想我只是想了解为什么 OleDB和ODBC速度要慢得多,我很想听听自2011年那篇文章以来是否有人找到了比DAO更好的答案。我真的更愿意避免使用DAO和/或自动化,因为他们要求客户端机器具有Access或数据库引擎可再发行(或者我坚持使用不支持.ACCDB的DAO 3.6)。

Original attempt; 原始尝试; ~100 seconds for 100,000 records/10 columns: 100,000条记录/ 10列约100秒:

Dim accessDB As New OleDb.OleDbConnection( _ 
                      "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & _
                                accessPath & ";Persist Security Info=True;")
accessDB.Open()

Dim accessCommand As OleDb.OleDbCommand = accessDB.CreateCommand
Dim accessDataAdapter As New OleDb.OleDbDataAdapter( _
                                   "SELECT * FROM " & tableName, accessDB)
Dim accessCommandBuilder As New OleDb.OleDbCommandBuilder(accessDataAdapter)

Dim accessDataTable As New DataTable
accessDataTable.Load(_Reader, System.Data.LoadOption.Upsert)

//This command is what takes 99% of the runtime; loops through each row and runs 
//the update command that is built by the command builder. The problem seems to 
//be that you can't change the UpdateBatchSize property with MS Access
accessDataAdapter.Update(accessDataTable)

Anyway, I thought this was really odd so I tried several flavors of the same thing: 无论如何,我觉得这很奇怪所以我尝试了几种同样的东西:

  • Switching out OleDB for ODBC 切换OleDB for ODBC
  • Looping through the data table and running an INSERT statement for each row 循环遍历数据表并为每一行运行INSERT语句
    • This is what .Update does anyway 无论如何,这就是.Update所做的
  • Using the ACE provider instead of Jet (ODBC and OleDB) 使用ACE提供程序而不是Jet(ODBC和OleDB)
  • Running the Data Adapter Update from within the DataReader.Read loop 从DataReader.Read循环中运行Data Adapter Update
    • Out of frustration; 出于沮丧; it was hilarious. 这很有趣。

Finally, I tried using DAO. 最后,我尝试使用DAO。 The code should basically be doing the same thing; 代码应该基本上做同样的事情; except it clearly isn't, because it this runs in ~10 seconds. 除非它显然不是,因为它在~10秒内运行。

 Dim dbEngine As New DAO.DBEngine
 Dim accessDB As DAO.Database = dbEngine.OpenDatabase(accessPath)
 Dim accessTable As DAO.Recordset = accessDB.OpenRecordset(tableName)

While _Reader.Read
    accessTable.AddNew()
      For i = 0 To _Reader.FieldCount - 1
        accessTable.Fields(i).Value = _Reader.Item(i).ToString
      Next
    accessTable.Update()
End While

A few other notes: 其他几点说明:

  • Everything is converted to Strings in all examples to try to keep things as simple and consistent as possible 在所有示例中,所有内容都转换为字符串,以尽可能保持简单和一致
    • Exception: In my first example, using the Table.Load function, I don't because... well, I really can't, but I did basically the same thing when I looped through the reader and built insert commands (which is what it's doing, anyway). 例外:在我的第一个例子中,使用Table.Load函数,我不是因为......好吧,我真的不能,但是当我通过阅读器循环并构建插入命令时,我做的基本相同(这是无论如何,它在做什么。 It didn't help. 它没有帮助。
  • For Each Field...Next vs. Field(i) vs. Field(name) made no difference for me 对于每个领域......下一个对阵场(i)对阵场(名称)对我没有任何影响
  • Every test I ran started with an empty, pre-built data table in a freshly compacted Access database 我运行的每个测试都是在新压缩的Access数据库中使用空的预构建数据表开始的
  • Loading the Data Reader to a Data Table in memory takes ~3 seconds 将数据读取器加载到内存中的数据表大约需要3秒钟
  • I don't think it's an issue with marshaling the data, because Marc's post indicated that loading a text file via Automation is as fast as DAO -- if anything, it shouldn't marshal the data when using ODBC/OleDB, but it should when using Automation 我不认为这是编组数据的问题,因为Marc的帖子表明通过自动化加载文本文件与DAO一样快 - 如果有的话,它不应该在使用ODBC / OleDB时编组数据,但它应该使用自动化时
  • All of this bothers me way more than it should, because it doesn't make sense 所有这些都让我感到困扰,因为它没有意义

Hopefully someone will be able to shed some light on this... it's just strange. 希望有人能够对此有所了解......这很奇怪。 Thanks in advance! 提前致谢!

The reason here is that the DAO driver sits much closer to the MS Access Database engine than the ODBC driver. 这里的原因是DAO驱动程序比ODBC驱动程序更接近MS Access数据库引擎。

The DAO methods AddNew and Update delegate directly to MS Access equivalents, at no point does it generate SQL, so there's no SQL to be parsed by the MS Access. DAO方法AddNewUpdate委托直接到MS Access等价物,它决不会生成SQL,因此MS Access无法解析SQL。

On the other hand, the DataAdapter code generates an Update statement for each row, that update statement gets passed to ODBC, which then passes this to a MSAccess driver, which either 另一方面,DataAdapter代码为每一行生成一个Update语句,该更新语句将传递给ODBC,然后ODBC将其传递给MSAccess驱动程序,该驱动程序要么

  1. independently parses the SQL and issues AddNew and Update commands to the Access database or 独立解析SQL并向Access数据库发出AddNewUpdate命令
  2. passes the SQL to MS Access, which isn't optimised for parsing SQL, and which once parsed, ends up translating the SQL into AddNew and Update commands. 将SQL传递给MS Access,MS Access未针对解析SQL进行优化,一旦解析,最终将SQL转换为AddNewUpdate命令。

either way, your time is taken generating SQL and then having something interpret that SQL, where the DAO approach bypasses SQL generation / interpretation and goes straight to the metal. 无论哪种方式,你的时间都会产生SQL,然后有一些东西解释SQL,DAO方法绕过SQL生成/解释并直接进入金属。

One way around this is to create your own "database service" running on the machine with the access db. 解决此问题的一种方法是使用访问db创建在计算机上运行的自己的“数据库服务”。 This marshals your selects & updates and could communicate with the client over remoting, WCF (http, or whatever). 这会整理您的选择和更新,并可以通过远程处理,WCF(http或其他)与客户端进行通信。 This is a lot of work and changes your application logic considerably. 这是一项很多工作,并且会大大改变您的应用程序逻辑。

Figuring out the correct name for the database driver (eg Jet or whatever) is an exercise left to the reader 确定数据库驱动程序的正确名称(例如Jet或其他)是留给读者的练习

I know this question is old but the answer may help someone still struggling with this. 我知道这个问题已经过时了,但答案可能会帮助那些仍在努力解决这个问题的人。

There is another method to consider. 还有另一种方法需要考虑。 As both source and target connection strings are known, source tables can be linked to the target Access database, possibly with some connection string parsing needed, via DAO or ADOX (I know, ADOX is off-topic here). 由于源连接字符串和目标连接字符串都是已知的,源表可以链接到目标Access数据库,可能需要通过DAO或ADOX进行一些连接字符串解析(我知道,ADOX在这里是偏离主题的)。
The data in tables so linked can then be transferred fairly quickly by issuing statements like this on a DAO or OleDb connection to the target Access database: 通过在DAO或OleDb连接上向目标Access数据库发出这样的语句,可以相当快速地传输如此链接的表中的数据:

SELECT * INTO Table1 FROM _LINKED_Table1

Some drawbacks (please point out anything I missed): 一些缺点(请指出我错过的任何东西):

  • source table must contain a primary key 源表必须包含主键
  • primary keys and indexes have to be re-created by examining the source Indexes schema 必须通过检查源索引架构来重新创建主键和索引
  • not easily getting transfer progress status while the query is running 在查询运行时不容易获得传输进度状态

Some advantages (please point out anything I missed): 一些优点(请指出我错过的任何东西):

  • only having to examine the source Tables schema, if all user tables are to be copied 如果要复制所有用户表,则只需要检查源表模式
  • not having to examine the source Columns schema to generate column definitions for CREATE TABLE statements 不必检查源Columns模式以生成CREATE TABLE语句的列定义
    (for instance, try getting the AUTONUMBER / IDENTITY info reliably out of an OleDb schema, ie without assumptions about combinations of column values and flag bits based on examining other schemas) (例如,尝试从OleDb模式中可靠地获取AUTONUMBER / IDENTITY信息,即不基于检查其他模式而对列值和标志位的组合进行假设)
  • not having to generate vast amounts of INSERT INTO ... VALUES ... statements, accounting for AUTONUMBER / IDENTITY columns in your code, or otherwise have a database operation run for each row in your code 不必生成大量的INSERT INTO ... VALUES ...语句,在代码中考虑AUTONUMBER / IDENTITY列,或者为代码中的每一行运行数据库操作
  • being able to specify criteria to filter transferred records 能够指定过滤转移记录的标准
  • not having to worry about text, date or time columns or how to delimit, escape or format their values in queries except when used in query criteria 不必担心文本,日期或时间列或如何在查询中分隔,转义或格式化它们的值,除非在查询条件中使用

This method was employed in a production project and turned out to be the quickest, for me at least. 这种方法用于生产项目,至少对我来说是最快的。 :o) :O)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM