[英]Improve SQL query in Excel VBA
I have 3 tables in the Excel workbook that I access with SQL. 我用SQL访问的Excel工作簿中有3个表。
There is Inscriptions table that holds the AGENT_ID
and MLS_ID
, PHOTOS
table that holds all the photos that came in recent feed for MLS_ID
, and PHOTOS_CURRENT
that holds all the photos that are currently in the system for MLS_ID
. 有题字表保存
AGENT_ID
和MLS_ID
, PHOTOS
表保存最近供稿的MLS_ID
所有照片,而PHOTOS_CURRENT
保存系统中当前供MLS_ID
所有照片。
The goal is to find if there are photos in the new feed that are not in the system currently. 目的是查找新供稿中是否有当前系统中没有的照片。
I tried to query using NOT EXISTS
and NOT IN
approach. 我试图使用
NOT EXISTS
和NOT IN
方法进行查询。 Both take too long to run (sometimes 2 minutes per AGENT_ID
). 两者都花费太长时间(有时每个
AGENT_ID
2分钟)。
NOT EXISTS
approach: NOT EXISTS
方法:
sqlQuery = "SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR, [PHOTOS$] P1 " & _
"WHERE INSCR.AGENT_ID = " & inpAgentId & _
" AND INSCR.MLS_ID = P1.MLS_ID AND NOT exists (select 1 from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID)"
NOT IN
approach: NOT IN
方法中:
sqlQuery = "SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR, [PHOTOS$] P1 " & _
"WHERE INSCR.AGENT_ID = " & inpAgentId & _
" AND INSCR.MLS_ID = P1.MLS_ID AND INSCR.MLS_ID NOT IN (select MLS_ID from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID)"
DB connection is done as follows: 数据库连接如下:
Sub Connect()
Set objConnection = CreateObject("ADODB.Connection")
objConnection.CommandTimeout = 120
End Sub
The query is sent to the procedure for processing as follows: 查询被发送到过程进行如下处理:
Function select_query(sqlQuery As String) As ADODB.Recordset
Dim objRecordset As ADODB.Recordset
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1
Set objRecordset = CreateObject("ADODB.Recordset")
objConnection.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & ThisWorkbook.FullName & _
";Extended Properties=""Excel 8.0;HDR=Yes;IMEX=1"";"
objRecordset.Open sqlQuery, objConnection, adOpenStatic, adLockOptimistic,
adCmdText
Set select_query = objRecordset
End Function
Any suggestions to improve the performance? 有任何改善性能的建议吗?
Consider the following tips that may help: 考虑以下提示可能会有所帮助:
Explicit JOIN : Right now you are running the outdated implicit join with a match of IDs in the WHERE
clause and not the current standard of the explicit JOIN
clause. 显式联接 :现在,您正在运行过时的隐式
JOIN
并且在WHERE
子句中使用ID匹配,而不是显式JOIN
子句的当前标准。 In most database engines, this should not change performance but anecdotal evidence suggests on specific use cases it can: 在大多数数据库引擎中,这不应改变性能,但坊间证据表明,在特定的用例中,它可以:
SELECT DISTINCT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR INNER JOIN [PHOTOS$] P1 ON INSCR.MLS_ID = P1.MLS_ID WHERE INSCR.AGENT_ID = " & inpAgentId & _ AND NOT EXISTS (select 1 from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID)
GROUP BY vs DISTINCT : This is a regular debate in SQL where different database engines process the non-duplicates queries differently. GROUP BY vs DISTINCT :这是SQL中的常规辩论,其中不同的数据库引擎以不同的方式处理非重复查询。 In theory, there should be no difference in performance but anecdotal evidence suggest otherwise.
从理论上讲,性能应该没有差异,但传闻证据却相反。 Therefore, consider an equivalent
GROUP BY
version: 因此,请考虑等效的
GROUP BY
版本:
SELECT INSCR.MLS_ID FROM [INSCRIPTIONS_CURRENT$] INSCR INNER JOIN [PHOTOS$] P1 ON INSCR.MLS_ID = P1.MLS_ID WHERE INSCR.AGENT_ID = " & inpAgentId & _ AND NOT EXISTS (select 1 from [PHOTOS_CURRENT$] PC1 where PC1.MLS_ID = P1.MLS_ID and PC1.PHOTO_ID = P1.PHOTO_ID) GROUP BY INSCR.MLS_ID
DAO Connection : Since querying workbooks utilizes the JET/ACE SQL Engine, consider DAO as a specific interface that can exploit many advantages of this engine and not ADO a more generalized interface across any data source (Oracle, SQL Server, Postgres, etc.). DAO连接 :由于查询工作簿使用JET / ACE SQL引擎,因此将DAO视为可以利用此引擎的许多优点的特定接口,而不是ADO,它是跨任何数据源(Oracle,SQL Server,Postgres等)的更通用的接口。 。
' ADD REFERENCE: Microsoft Office #.# Access Database Engine Object Library Dim conn As New DAO.DBEngine, db As DAO.Database, qdef As DAO.QueryDef, rst As DAO.Recordset Set db = conn.OpenDatabase("C:\\Path\\To\\Workbook.xls", False, True, "Excel 8.0;HDR=Yes;") Set rst = db.OpenRecordset(sqlQuery) ... rst.Close: db.Close Set rst = Nothing: Set db = Nothing: Set conn = Nothing
OLEDB (ACE) Connection : Consider the newer OLEDB provider which should still work with any version of Excel (.xls or .xlsx, .xlsb, .xlsm). OLEDB(ACE)连接 :考虑较新的OLEDB提供程序,该提供程序仍应与任何版本的Excel(.xls或.xlsx,.xlsb,.xlsm)一起使用。 Check available providers with this PowerShell script .
使用此PowerShell脚本检查可用的提供程序。
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0;" ... objConnection.Open "Provider=Microsoft.ACE.OLEDB.16.0;" ...
ODBC Connection : Connecting interfaces can pose different performance of query execution where anecdotal evidence differs from theory. ODBC连接 :在轶事证据与理论不同的情况下,连接接口可以带来不同的查询执行性能。 Therefore, consider replacing the OLEDB provider for ODBC driver connection:
因此,考虑替换OLEDB提供程序以进行ODBC驱动程序连接:
' DRIVER VERSION objConnection.Open "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _ & "DBQ=C:\\Path\\To\\Excel.xls;" ' DSN VERSION objConnection.Open "DSN=Excel Files;DBQ=C:\\Path\\To\\Excel.xls;"
Cursor/Lock Types : Experiment with the cursor types as performance can vary such as adOpenForwardOnly
vs adOpenStatic
and even LockType with adLockOptimistic
vs adLockReadOnly
. 游标/锁定类型 :尝试使用游标类型,因为性能可能会有所不同,例如
adOpenForwardOnly
与adOpenStatic
甚至是LockType与adLockOptimistic
与adLockReadOnly
。
Thanks @TimWilliams, your comment was most helpful in solving this problem. 感谢@TimWilliams,您的评论对解决此问题最有帮助。 What I ended up doing is writing a separate routine that, during feed load, creates a table of all photos that were changed like this:
我最终要做的是编写一个单独的例程,该例程在Feed加载期间创建一张表,其中包含所有更改过的照片,如下所示:
sqlQuery = "INSERT INTO [PHOTO_UPDATES$] SELECT P1.* " & _
"FROM [PHOTOS$] P1 LEFT JOIN [PHOTOS_CURRENT$] PC1 " & _
"ON P1.MLS_ID = PC1.MLS_ID AND P1.PHOTO_ID = PC1.PHOTO_ID WHERE PC1.PHOTO_ID is NULL"
Then, when creating the worklist per agent, the following is done: 然后,在为每个代理创建工作清单时,将执行以下操作:
sqlQuery = "SELECT DISTINCT INSCR.MLS_ID " & _
"FROM [PHOTO_UPDATES$] PU1 , [INSCRIPTIONS_CURRENT$] INSCR " & _
"WHERE INSCR.AGENT_ID = " & inpAgentId & " " & _
"AND PU1.MLS_ID = INSCR.MLS_ID "
Both routines take less than 1 second to run. 两种例程的运行时间都少于1秒。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.