[英]SAX Parser in VBA
I am working on a VBA code to parse large xml files.我正在使用 VBA 代码来解析大型 xml 文件。
Initially, I made use of DOM parser but due to memory leak the system hangs and that could not solve my problem.最初,我使用 DOM 解析器,但由于 memory 泄漏,系统挂起,无法解决我的问题。
Now, I turn to using SAX handler as shown in the code below.现在,我转向使用 SAX 处理程序,如下面的代码所示。
I get我明白了
System error: -2146828275
系统错误:-2146828275
The file is 9GB.该文件为 9GB。
Class Module Class模块
Name: clsBook名称:clsBook
Option Explicit
Private mID As Integer
Private mAuthour As String
Private mTitle As String
Private mGenre As String
Private mPrice As String
Private mPublishDate As String
Private mDescription As String
Public Static Property Get ID() As Integer
ID = mID
End Property
Public Static Property Let ID(ByVal vNewValue As Integer)
mID = vNewValue
End Property
Public Static Property Get Authour() As String
Authour = mAuthour
End Property
Public Static Property Let Authour(ByVal vNewValue As String)
mAuthour = vNewValue
End Property
Public Property Get Title() As String
Title = mTitle
End Property
Public Property Let Title(ByVal vNewValue As String)
mTitle = vNewValue
End Property
Public Property Get Genre() As String
Genre = mGenre
End Property
Public Property Let Genre(ByVal vNewValue As String)
bGenre = mNewValue
End Property
Public Property Get Price() As String
Price = mPrice
End Property
Public Property Let Price(ByVal vNewValue As String)
mPrice = vNewValue
End Property
Public Property Get Description() As String
Description = mDescription
End Property
Public Property Let Description(ByVal vNewValue As String)
mDescription = vNewValue
End Property
Public Property Get PublishedDate() As String
PublishedDate = mPublishDate
End Property
Public Property Let PublishedDate(ByVal vNewValue As String)
mPublishDate = vNewValue
End Property
Class Module: ContentHandlerImpl Class 模块:ContentHandlerImpl
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bAuthor As Boolean
Private bTitle As Boolean
Private bPrice As Boolean
Private bGenre As Boolean
Private bDescription As Boolean
Private bPublishDate As Boolean
Private mBook As clsBook
Private mBooks As Collection
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bAuthor) Then
mBook.Authour = strChars
bAuthor = False
ElseIf (bTitle) Then
mBook.Title = strChars
bTitle = False
ElseIf (bGenre) Then
mBook.Genre = strChars
bGenre = False
ElseIf (bPrice) Then
mBook.Price = strChars
bPrice = False
ElseIf (bPublishDate) Then
mBook.PublishedDate = strChars
bPublishDate = False
ElseIf (bDescription) Then
mBook.Description = strChars
bDescription = False
End If
End Sub
Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)
End Property
Private Sub IVBSAXContentHandler_endDocument()
End Sub
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "book"
If mBooks Is Nothing Then
Set mBooks = New Collection
End If
mBooks.Add (mBook)
If mBook Is Not Nothing Then
Set mBook = Nothing
End If
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)
End Sub
Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)
End Sub
Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)
End Sub
Private Sub IVBSAXContentHandler_skippedEntity(strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument()
End Sub
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "book"
If mBook Is Nothing Then
Set mBook = New clsBook
End If
mBook.ID = CInt(oAttributes.getValueFromName("", "id"))
Case "author"
bAuthor = True
Case "title"
bTitle = True
Case "genre"
bGenre = True
Case "price"
bPrice = True
Case "publish_date"
bPublishDate = True
Case "description"
bDescription = True
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)
End Sub
Public Function getBooks() As Collection
getBooks = mBooks
End Function
Test Function测试 Function
Sub main()
Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl
Dim iItems As Collection
Dim iItem As clsBook
Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl
Set saxReader.contentHandler = saxhandler
saxReader.Parse ThisWorkbook.Path & "\books.xml"
Set iItem = New clsBook
Set iItems = saxhandler.getBooks
For Each iItem In iItems
Debug.Print "ID: " & iItem.ID & vbCrLf & "Authour: " & iItem.Authour & vbCrLf & "Title: " & iItem.Title & vbCrLf
Next iItem
Set saxReader = Nothing
End Sub
'############Ps find below the point where I am getting the error
Sub main()
Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl
Set saxReader.contentHandler = saxhandler
saxReader.Parse ThisWorkbook.Path & "\books.xml"
Set saxReader = Nothing
End Sub
Error is System error: -2146828275 and the file is 9GB.错误是系统错误:-2146828275,文件为 9GB。
Thanks谢谢
Man, 9GB, that isn't large, that is huge.伙计,9GB,不是很大,就是很大。 The sax parser should be able to handle it but even so it'll most likely be a bit of a stretch, The first thing I'd suggest is make everything a long.
sax 解析器应该能够处理它,但即便如此,它也很可能有点牵强,我建议的第一件事就是让一切都变长。 forget about using int,s.
忘记使用 int,s。 esp.
尤其是in VBA, From memory an int in VB6/VBA is only 60k and change.
在 VBA 中,从 memory VB6/VBA 中的 int 仅为 60k 和变化。 it ain't gonna cut it, In fact even long in VB6 is max 2,147,483,647.
它不会削减它,事实上,即使在 VB6 中的时间最长为 2,147,483,647。 and if you're doing a count internally for a 9GB file it'll probably pass this so just use double.
如果你在内部对一个 9GB 的文件进行计数,它可能会通过这个,所以只需使用 double。 Apart form this your machine's memory is going to be the limiting factor.
除此之外,您机器的 memory 将成为限制因素。 It's been a while since you posted this - I'm curious as to how you went.
自从你发布这个已经有一段时间了 - 我很好奇你是怎么去的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.