简体   繁体   English

改善Excel VBA中的SQL查询

[英]Improve SQL query in Excel VBA

I have 3 tables in the Excel workbook that I access with SQL. 我用SQL访问的Excel工作簿中有3个表。
There is Inscriptions table that holds the AGENT_ID and MLS_ID , PHOTOS table that holds all the photos that came in recent feed for MLS_ID , and PHOTOS_CURRENT that holds all the photos that are currently in the system for MLS_ID . 有题字表保存AGENT_IDMLS_IDPHOTOS表保存最近供稿的MLS_ID所有照片,而PHOTOS_CURRENT保存系统中当前供MLS_ID所有照片。
The goal is to find if there are photos in the new feed that are not in the system currently. 目的是查找新供稿中是否有当前系统中没有的照片。

I tried to query using NOT EXISTS and NOT IN approach. 我试图使用NOT EXISTSNOT IN方法进行查询。 Both take too long to run (sometimes 2 minutes per AGENT_ID ). 两者都花费太长时间(有时每个AGENT_ID 2分钟)。

NOT EXISTS approach: NOT EXISTS方法:

sqlQuery = "SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR, [PHOTOS$] P1 " & _
                "WHERE INSCR.AGENT_ID = " & inpAgentId & _
                " AND INSCR.MLS_ID = P1.MLS_ID AND NOT exists (select 1 from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID)"

NOT IN approach: NOT IN方法中:

sqlQuery = "SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR, [PHOTOS$] P1 " & _
                "WHERE INSCR.AGENT_ID = " & inpAgentId & _
                " AND INSCR.MLS_ID = P1.MLS_ID AND INSCR.MLS_ID NOT IN (select MLS_ID from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID)"

DB connection is done as follows: 数据库连接如下:

Sub Connect()

    Set objConnection = CreateObject("ADODB.Connection")
    objConnection.CommandTimeout = 120

End Sub

The query is sent to the procedure for processing as follows: 查询被发送到过程进行如下处理:

Function select_query(sqlQuery As String) As ADODB.Recordset

    Dim objRecordset As ADODB.Recordset

    Const adOpenStatic = 3
    Const adLockOptimistic = 3
    Const adCmdText = &H1

    Set objRecordset = CreateObject("ADODB.Recordset")

    objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
    "Data Source=" & ThisWorkbook.FullName & _
    ";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"

    objRecordset.Open sqlQuery, objConnection, adOpenStatic, adLockOptimistic, 
    adCmdText

    Set select_query = objRecordset

End Function

Any suggestions to improve the performance? 有任何改善性能的建议吗?

Consider the following tips that may help: 考虑以下提示可能会有所帮助:

  • Explicit JOIN : Right now you are running the outdated implicit join with a match of IDs in the WHERE clause and not the current standard of the explicit JOIN clause. 显式联接 :现在,您正在运行过时的隐式JOIN并且在WHERE子句中使用ID匹配,而不是显式JOIN子句的当前标准。 In most database engines, this should not change performance but anecdotal evidence suggests on specific use cases it can: 在大多数数据库引擎中,这不应改变性能,但坊间证据表明,在特定的用例中,它可以:

     SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR INNER JOIN [PHOTOS$] P1 ON INSCR.MLS_ID = P1.MLS_ID WHERE INSCR.AGENT_ID = " & inpAgentId & _ AND NOT EXISTS (select 1 from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID) 
  • GROUP BY vs DISTINCT : This is a regular debate in SQL where different database engines process the non-duplicates queries differently. GROUP BY vs DISTINCT :这是SQL中的常规辩论,其中不同的数据库引擎以不同的方式处理非重复查询。 In theory, there should be no difference in performance but anecdotal evidence suggest otherwise. 从理论上讲,性能应该没有差异,但传闻证据却相反。 Therefore, consider an equivalent GROUP BY version: 因此,请考虑等效的GROUP BY版本:

     SELECT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR INNER JOIN [PHOTOS$] P1 ON INSCR.MLS_ID = P1.MLS_ID WHERE INSCR.AGENT_ID = " & inpAgentId & _ AND NOT EXISTS (select 1 from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID) GROUP BY INSCR.MLS_ID 
  • DAO Connection : Since querying workbooks utilizes the JET/ACE SQL Engine, consider DAO as a specific interface that can exploit many advantages of this engine and not ADO a more generalized interface across any data source (Oracle, SQL Server, Postgres, etc.). DAO连接 :由于查询工作簿使用JET / ACE SQL引擎,因此将DAO视为可以利用此引擎的许多优点的特定接口,而不是ADO,它是跨任何数据源(Oracle,SQL Server,Postgres等)的更通用的接口。 。

     ' ADD REFERENCE: Microsoft Office #.# Access Database Engine Object Library Dim conn As New DAO.DBEngine, db As DAO.Database, qdef As DAO.QueryDef, rst As DAO.Recordset Set db = conn.OpenDatabase("C:\\Path\\To\\Workbook.xls", False, True, "Excel 8.0;HDR=Yes;") Set rst = db.OpenRecordset(sqlQuery) ... rst.Close: db.Close Set rst = Nothing: Set db = Nothing: Set conn = Nothing 
  • OLEDB (ACE) Connection : Consider the newer OLEDB provider which should still work with any version of Excel (.xls or .xlsx, .xlsb, .xlsm). OLEDB(ACE)连接 :考虑较新的OLEDB提供程序,该提供程序仍应与任何版本的Excel(.xls或.xlsx,.xlsb,.xlsm)一起使用。 Check available providers with this PowerShell script . 使用此PowerShell脚本检查可用的提供程序。

     objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;" ... objConnection.Open "Provider=Microsoft.ACE.OLEDB.16.0;" ... 
  • ODBC Connection : Connecting interfaces can pose different performance of query execution where anecdotal evidence differs from theory. ODBC连接 :在轶事证据与理论不同的情况下,连接接口可以带来不同的查询执行性能。 Therefore, consider replacing the OLEDB provider for ODBC driver connection: 因此,考虑替换OLEDB提供程序以进行ODBC驱动程序连接:

     ' DRIVER VERSION objConnection.Open "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _ & "DBQ=C:\\Path\\To\\Excel.xls;" ' DSN VERSION objConnection.Open "DSN=Excel Files;DBQ=C:\\Path\\To\\Excel.xls;" 
  • Cursor/Lock Types : Experiment with the cursor types as performance can vary such as adOpenForwardOnly vs adOpenStatic and even LockType with adLockOptimistic vs adLockReadOnly . 游标/锁定类型 :尝试使用游标类型,因为性能可能会有所不同,例如adOpenForwardOnlyadOpenStatic甚至是LockTypeadLockOptimisticadLockReadOnly

Thanks @TimWilliams, your comment was most helpful in solving this problem. 感谢@TimWilliams,您的评论对解决此问题最有帮助。 What I ended up doing is writing a separate routine that, during feed load, creates a table of all photos that were changed like this: 我最终要做的是编写一个单独的例程,该例程在Feed加载期间创建一张表,其中包含所有更改过的照片,如下所示:

sqlQuery = "INSERT INTO [PHOTO_UPDATES$] SELECT P1.* " & _
                "FROM [PHOTOS$] P1 LEFT JOIN [PHOTOS_CURRENT$] PC1 " & _
                "ON P1.MLS_ID = PC1.MLS_ID AND P1.PHOTO_ID = PC1.PHOTO_ID WHERE PC1.PHOTO_ID is NULL"

Then, when creating the worklist per agent, the following is done: 然后,在为每个代理创建工作清单时,将执行以下操作:

sqlQuery = "SELECT DISTINCT INSCR.MLS_ID " & _
                "FROM [PHOTO_UPDATES$] PU1 , [INSCRIPTIONS_CURRENT$] INSCR " & _
                "WHERE INSCR.AGENT_ID = " & inpAgentId & " " & _
                "AND PU1.MLS_ID = INSCR.MLS_ID "

Both routines take less than 1 second to run. 两种例程的运行时间都少于1秒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM