简体   繁体   English

VBA 中的 SAX 解析器

[英]SAX Parser in VBA

I am working on a VBA code to parse large xml files.我正在使用 VBA 代码来解析大型 xml 文件。

Initially, I made use of DOM parser but due to memory leak the system hangs and that could not solve my problem.最初,我使用 DOM 解析器,但由于 memory 泄漏,系统挂起,无法解决我的问题。

Now, I turn to using SAX handler as shown in the code below.现在,我转向使用 SAX 处理程序,如下面的代码所示。

I get我明白了

System error: -2146828275系统错误:-2146828275

The file is 9GB.该文件为 9GB。

Class Module Class模块

Name: clsBook名称:clsBook

Option Explicit

Private mID As Integer
Private mAuthour As String
Private mTitle As String
Private mGenre As String
Private mPrice As String
Private mPublishDate As String
Private mDescription As String

Public Static Property Get ID() As Integer
ID = mID
End Property

Public Static Property Let ID(ByVal vNewValue As Integer)
mID = vNewValue
End Property

Public Static Property Get Authour() As String
Authour = mAuthour
End Property

Public Static Property Let Authour(ByVal vNewValue As String)
mAuthour = vNewValue
End Property

Public Property Get Title() As String
Title = mTitle
End Property

Public Property Let Title(ByVal vNewValue As String)
mTitle = vNewValue
End Property

Public Property Get Genre() As String
Genre = mGenre
End Property

Public Property Let Genre(ByVal vNewValue As String)
bGenre = mNewValue
End Property

Public Property Get Price() As String
Price = mPrice
End Property

Public Property Let Price(ByVal vNewValue As String)
mPrice = vNewValue
End Property

Public Property Get Description() As String
Description = mDescription
End Property

Public Property Let Description(ByVal vNewValue As String)
mDescription = vNewValue
End Property

Public Property Get PublishedDate() As String
PublishedDate = mPublishDate
End Property

Public Property Let PublishedDate(ByVal vNewValue As String)
mPublishDate = vNewValue
End Property

Class Module: ContentHandlerImpl Class 模块:ContentHandlerImpl

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bAuthor As Boolean
Private bTitle As Boolean
Private bPrice As Boolean
Private bGenre As Boolean
Private bDescription As Boolean
Private bPublishDate As Boolean
Private mBook As clsBook
Private mBooks As Collection

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bAuthor) Then
    mBook.Authour = strChars
    bAuthor = False
ElseIf (bTitle) Then
    mBook.Title = strChars
    bTitle = False
ElseIf (bGenre) Then
    mBook.Genre = strChars
    bGenre = False
ElseIf (bPrice) Then
    mBook.Price = strChars
    bPrice = False
ElseIf (bPublishDate) Then
    mBook.PublishedDate = strChars
    bPublishDate = False
ElseIf (bDescription) Then
    mBook.Description = strChars
    bDescription = False
End If

End Sub

Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
Case "book"
If mBooks Is Nothing Then
    Set mBooks = New Collection
End If
mBooks.Add (mBook)
If mBook Is Not Nothing Then
    Set mBook = Nothing
End If
Case Else
' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)

End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)

End Sub

Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)

End Sub

Private Sub IVBSAXContentHandler_skippedEntity(strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument()

End Sub

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
Case "book"
    If mBook Is Nothing Then
        Set mBook = New clsBook
    End If
    mBook.ID = CInt(oAttributes.getValueFromName("", "id"))
Case "author"
    bAuthor = True
Case "title"
    bTitle = True
Case "genre"
    bGenre = True
Case "price"
    bPrice = True
Case "publish_date"
    bPublishDate = True
Case "description"
    bDescription = True
Case Else
    ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)

End Sub

Public Function getBooks() As Collection
getBooks = mBooks
End Function

Test Function测试 Function

Sub main()

Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl
Dim iItems As Collection
Dim iItem As clsBook

Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl

Set saxReader.contentHandler = saxhandler
saxReader.Parse ThisWorkbook.Path & "\books.xml"

Set iItem = New clsBook
Set iItems = saxhandler.getBooks

For Each iItem In iItems
    Debug.Print "ID: " & iItem.ID & vbCrLf & "Authour: " & iItem.Authour & vbCrLf & "Title: " & iItem.Title & vbCrLf
Next iItem

Set saxReader = Nothing
End Sub


'############Ps find below the point where I am getting the error
Sub main()

Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl

Set saxReader.contentHandler = saxhandler
saxReader.Parse ThisWorkbook.Path & "\books.xml"

Set saxReader = Nothing
End Sub

Error is System error: -2146828275 and the file is 9GB.错误是系统错误:-2146828275,文件为 9GB。

Thanks谢谢

Man, 9GB, that isn't large, that is huge.伙计,9GB,不是很大,就是很大。 The sax parser should be able to handle it but even so it'll most likely be a bit of a stretch, The first thing I'd suggest is make everything a long. sax 解析器应该能够处理它,但即便如此,它也很可能有点牵强,我建议的第一件事就是让一切都变长。 forget about using int,s.忘记使用 int,s。 esp.尤其是in VBA, From memory an int in VB6/VBA is only 60k and change.在 VBA 中,从 memory VB6/VBA 中的 int 仅为 60k 和变化。 it ain't gonna cut it, In fact even long in VB6 is max 2,147,483,647.它不会削减它,事实上,即使在 VB6 中的时间最长为 2,147,483,647。 and if you're doing a count internally for a 9GB file it'll probably pass this so just use double.如果你在内部对一个 9GB 的文件进行计数,它可能会通过这个,所以只需使用 double。 Apart form this your machine's memory is going to be the limiting factor.除此之外,您机器的 memory 将成为限制因素。 It's been a while since you posted this - I'm curious as to how you went.自从你发布这个已经有一段时间了 - 我很好奇你是怎么去的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM