简体   繁体   中英

VBA Slow writing arrays to excel workbook

Just wondering if anyone can offer any suggestions that might improve the speed that my code writes arrays to a workbook.

I'm writing about 1.9 million rows of data to several sheets in a workbook, one sheet at a time. While the code completes, it takes about 18 hours to write to the excel workbook, which seems ridiculously excessive. Here is the setup. I open the workbook as such:

Dim ExcelAp As Excel.Application
Dim ouputWorkbook As Excel.Workbook

Set ExcelAp = New Excel.Application
Set outputWorkbook = ExcelAp.Workbooks.Open("S:\Some Directory\Template.xlsx")

Then I have the rows of the workbook in arrays loaded into a collection and I loop through the ranges in the workbook to copy the arrays:

For lonSheetOneCounter = 2 to 999999
    outputWorkbook.Worksheets(1).Range(_
        outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 1).Address & ":" & _
        outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 21).Address).Value = _
        outputCollection.item(lonSheetOneCounter - 1)
Next lonSheetOneCounter

The copying method is the same for the other sheets. I have made the workbook and instance of excel invisible, I have switched calculation to manual for that workbook, and I have also turned off screen updating, but it still takes about 18 hours give or take to complete copying over to the new workbook.

I've tried making a 2 dimensional array for the entier sheet, but no matter the method I use to do that I get an "out of memory error" the moment I attempt to copy that array to the workbook.

I'm not sure if there is anything else I can do to get by the error and reduce the time to copy, but if anyone has a suggestion, I'm all ears. For what it is worth, this macro is housed in another excel workbook running in a seperate instance of excel from the workbook I'm attempting to copy to.

Edit: Slight addition here. Something that I noticed that I wanted to draw attention to that also makes me think it may be possible to speed up the process. I've noticed that the macro slows down progressively. The first X number of rows write very fast, the following rows seem to slow down more and more as each row is written...

I'm going to try an experiment where I set up my template to automatically load a spreadsheet with 1 million used rows... sort of prompted by the suggestion at the bottom. I'm wondering if excel is having to alocate memory for all the extra rows. Perhaps if I start out with a workbook template that already has that number of rows set up, I might have an easier go of it.

Edit: It was pointed out to me that I wasn't clear about where the data I'm reading in is coming from. This data is read in using the VBA primitives from a number of text files. One is pipe delimited, the other two comma, not that the scheme of the files makes much difference.

As far as populating the array, here is a snippet of how that happens. It looks a mess but there simply isn't any other way to get the data to match up given the format of the three files I'm comparing. Anyway, now that I'm placing everything into large, large arrays this is how I'm populating those arrays. The references to arrViLine and arrNonIraLine and arrIraLine are simply the arrays that the lines of the file are parsed into from their original pipe and comma delimited formats:

    If arrViLine(2) = arrIraLine(1) Or arrViLine(2) = arrNonIraLine(1) Then
        If arrViLine(2) = arrIraLine(1) Then
            boolVi = True
            boolIra = True
            boolNonIra = False
            If lonMatchCounter <= 999999 Then
                matchOneArray(lonMatchCounter, 1) = arrViLine(1)
                matchOneArray(lonMatchCounter, 2) = arrViLine(2)
                matchOneArray(lonMatchCounter, 3) = arrIraLine(2)
                matchOneArray(lonMatchCounter, 4) = arrIraLine(3)
                matchOneArray(lonMatchCounter, 5) = arrViLine(3)
                matchOneArray(lonMatchCounter, 6) = arrViLine(4)
                matchOneArray(lonMatchCounter, 7) = arrIraLine(4)
                matchOneArray(lonMatchCounter, 8) = arrViLine(6)
                matchOneArray(lonMatchCounter, 9) = arrViLine(5)
                matchOneArray(lonMatchCounter, 10) = arrViLine(7)
                matchOneArray(lonMatchCounter, 11) = arrViLine(8)
                matchOneArray(lonMatchCounter, 12) = arrViLine(9)
                matchOneArray(lonMatchCounter, 13) = arrViLine(10)
                matchOneArray(lonMatchCounter, 14) = arrViLine(11)
                matchOneArray(lonMatchCounter, 15) = arrViLine(12)
                matchOneArray(lonMatchCounter, 16) = arrIraLine(5)
                matchOneArray(lonMatchCounter, 17) = arrIraLine(6)
                matchOneArray(lonMatchCounter, 18) = arrViLine(13)
                matchOneArray(lonMatchCounter, 19) = arrViLine(14)
                matchOneArray(lonMatchCounter, 20) = "IRA"
                matchOneArray(lonMatchCounter, 21) = arrViLine(15)
                lonMatchCounter = lonMatchCounter + 1
            Else
                lonMatchTwoCounter = lonMatchCounter - 999999
                matchTwoArray(lonMatchTwoCounter, 1) = arrViLine(1)
                matchTwoArray(lonMatchTwoCounter, 2) = arrViLine(2)
                matchTwoArray(lonMatchTwoCounter, 3) = arrIraLine(2)
                matchTwoArray(lonMatchTwoCounter, 4) = arrIraLine(3)
                matchTwoArray(lonMatchTwoCounter, 5) = arrViLine(3)
                matchTwoArray(lonMatchTwoCounter, 6) = arrViLine(4)
                matchTwoArray(lonMatchTwoCounter, 7) = arrIraLine(4)
                matchTwoArray(lonMatchTwoCounter, 8) = arrViLine(6)
                matchTwoArray(lonMatchTwoCounter, 9) = arrViLine(5)
                matchTwoArray(lonMatchTwoCounter, 10) = arrViLine(7)
                matchTwoArray(lonMatchTwoCounter, 11) = arrViLine(8)
                matchTwoArray(lonMatchTwoCounter, 12) = arrViLine(9)
                matchTwoArray(lonMatchTwoCounter, 13) = arrViLine(10)
                matchTwoArray(lonMatchTwoCounter, 14) = arrViLine(11)
                matchTwoArray(lonMatchTwoCounter, 15) = arrViLine(12)
                matchTwoArray(lonMatchTwoCounter, 16) = arrIraLine(5)
                matchTwoArray(lonMatchTwoCounter, 17) = arrIraLine(6)
                matchTwoArray(lonMatchTwoCounter, 18) = arrViLine(13)
                matchTwoArray(lonMatchTwoCounter, 19) = arrViLine(14)
                matchTwoArray(lonMatchTwoCounter, 20) = "IRA"
                matchTwoArray(lonMatchTwoCounter, 21) = arrViLine(15)
                lonMatchCounter = lonMatchCounter + 1
            End If
        Else 'arrViLine(2) must = arrNonIraLine(1)
            boolVi = True
            boolIra = False
            boolNonIra = True
            If lonMatchCounter <= 999999 Then
                matchOneArray(lonMatchCounter, 1) = arrViLine(1)
                matchOneArray(lonMatchCounter, 2) = arrViLine(2)
                matchOneArray(lonMatchCounter, 3) = arrNonIraLine(2)
                matchOneArray(lonMatchCounter, 4) = arrNonIraLine(3)
                matchOneArray(lonMatchCounter, 5) = arrViLine(3)
                matchOneArray(lonMatchCounter, 6) = arrViLine(4)
                matchOneArray(lonMatchCounter, 7) = arrNonIraLine(5)
                matchOneArray(lonMatchCounter, 8) = arrViLine(6)
                matchOneArray(lonMatchCounter, 9) = arrViLine(5)
                matchOneArray(lonMatchCounter, 10) = arrViLine(7)
                matchOneArray(lonMatchCounter, 11) = arrViLine(8)
                matchOneArray(lonMatchCounter, 12) = arrViLine(9)
                matchOneArray(lonMatchCounter, 13) = arrViLine(10)
                matchOneArray(lonMatchCounter, 14) = arrViLine(11)
                matchOneArray(lonMatchCounter, 15) = arrViLine(12)
                matchOneArray(lonMatchCounter, 16) = arrNonIraLine(4)
                matchOneArray(lonMatchCounter, 17) = arrNonIraLine(6)
                matchOneArray(lonMatchCounter, 18) = arrViLine(13)
                matchOneArray(lonMatchCounter, 19) = arrViLine(14)
                matchOneArray(lonMatchCounter, 20) = "IRA"
                matchOneArray(lonMatchCounter, 21) = arrViLine(15)
                lonMatchCounter = lonMatchCounter + 1
            Else
                lonMatchTwoCounter = lonMatchCounter - 999999
                matchTwoArray(lonMatchTwoCounter, 1) = arrViLine(1)
                matchTwoArray(lonMatchTwoCounter, 2) = arrViLine(2)
                matchTwoArray(lonMatchTwoCounter, 3) = arrNonIraLine(2)
                matchTwoArray(lonMatchTwoCounter, 4) = arrNonIraLine(3)
                matchTwoArray(lonMatchTwoCounter, 5) = arrViLine(3)
                matchTwoArray(lonMatchTwoCounter, 6) = arrViLine(4)
                matchTwoArray(lonMatchTwoCounter, 7) = arrNonIraLine(5)
                matchTwoArray(lonMatchTwoCounter, 8) = arrViLine(6)
                matchTwoArray(lonMatchTwoCounter, 9) = arrViLine(5)
                matchTwoArray(lonMatchTwoCounter, 10) = arrViLine(7)
                matchTwoArray(lonMatchTwoCounter, 11) = arrViLine(8)
                matchTwoArray(lonMatchTwoCounter, 12) = arrViLine(9)
                matchTwoArray(lonMatchTwoCounter, 13) = arrViLine(10)
                matchTwoArray(lonMatchTwoCounter, 14) = arrViLine(11)
                matchTwoArray(lonMatchTwoCounter, 15) = arrViLine(12)
                matchTwoArray(lonMatchTwoCounter, 16) = arrNonIraLine(4)
                matchTwoArray(lonMatchTwoCounter, 17) = arrNonIraLine(6)
                matchTwoArray(lonMatchTwoCounter, 18) = arrViLine(13)
                matchTwoArray(lonMatchTwoCounter, 19) = arrViLine(14)
                matchTwoArray(lonMatchTwoCounter, 20) = "Non-IRA"
                matchTwoArray(lonMatchTwoCounter, 21) = arrViLine(15)
                lonMatchCounter = lonMatchCounter + 1
            End If
        End If

You can also ignore the boolean variables, they are there to cue the macro as to whether or not the next line of a particular file should be read on the next loop.

EDIT: Not that it has much bearing on how fast I write the data to excel, consider the folowing lines to be an example of the format of the files I'm working with.

"Master" file:

Account Number|ID Number|Int Rate|Cum Int|Agreement|Type
12345|111111|.005|.01234|"C"|"IRA"
12346|111112|.005|.02345|"A"|"Non-IRA"
12347|111113|.004|.02345|"B"|"Non-IRA"

Match File One:

ID Number|Int Rate|Cum Int|Type
111111|.004|.01234|"IRA"

Match File Two:

ID Number|Int Rate|Cum Int|Type
111113|.004|.02345|"Non-IRA"

So that is just a little example of what I'm working with. Text files and CSV files that are listed in sequential order by the ID number. In the example above, the macro would match the first line of the master to match file one and record the data from all of the fields from both files to an array which will be output to an excel spreadsheet. The macro then reads in the next line of the master file and match file one, but carries over the line from file two to the next loop. The next line of the master would have no match and be recorded on a seperate sheet of the workbook. The last line of the master matches match file two and is recorded to the same array as the first match.

That's how the routine works, still, the real issue I'm having is the speed at which the data is written to the excel workbook. I'm currently working on carving the data into columns.

You don't need collections: just assign the data from a worksheet into a single variant and then assign the variant back to the new sheet.

To minimise memory etc try using the UsedRange on the worksheet. This example copies a column at a time: it takes 35 seconds to copy 1 million rows by 21 columns from 1 worksheet to another worksheet using Excel 2010 32-bit

 Sub getting()
    Dim var As Variant
    Dim j As Long
    Dim dTime As Double
    dTime = Now
    For j = 1 To 21
        var = Worksheets("Sheet3").UsedRange.Resize(, 1).Offset(0, j - 1).Value2
        Worksheets("Sheet1").Range("a1").Resize(UBound(var), UBound(var, 2)).Offset(0, j - 1) = var
    Next j
    MsgBox CStr(Now - dTime)
End Sub

I tried to test this will half a million rows going into an array, but got an out of memory error. You don't say how you're filling your collection/arrays, but I assume you're able to do it. I ended up with 400k x 21 array for demonstration purposes.

The part that's taking all the time is that you're writing to the sheet 21 cells at a time. Writing to the sheet is the most time intensive thing you can do in Excel VBA, so you need to minimize that operation as much as possible.

For this proof of concept, I read 400k x 21 pieces of data. The I write them out in 100k row increments to different sheets. For your purposes, you should make the biggest chunk array that your memory can handle.

Sub WriteDataToFiles()

    Dim vaData As Variant
    Dim vaChunk() As Variant
    Dim lStep As Long
    Dim i As Long, j As Long, k As Long
    Dim wb As Workbook, sh As Worksheet
    Dim lStart As Long

    lStart = Timer

    'Process in 100,000 row increments
    lStep = 10 ^ 5

    'Fill a big array with a bunch of data
    FillDataArray vaData
    'Show how big the array is
    Debug.Print UBound(vaData, 1) & " x " & UBound(vaData, 2)

    'Create a new workbook to write to
    Set wb = Workbooks.Add

    'loop through the big array in 100k increments
    For i = LBound(vaData, 1) To UBound(vaData, 1) Step lStep

        'dimension a smaller range to hold a subset of the big array
        ReDim vaChunk(1 To lStep, 1 To 21) 'clean out array

        'fill the smaller array with data from big array
        For j = LBound(vaChunk) To UBound(vaChunk)
            For k = 1 To 21
                vaChunk(j, k) = vaData(i + j - 1, k)
            Next k
        Next j

        'Add a new sheet
        Set sh = wb.Worksheets.Add

        'Write the small array to the sheet
        sh.Range("A1").Resize(UBound(vaChunk, 1), UBound(vaChunk, 2)).Value = vaChunk

    Next i

    'See how long it takes
    Debug.Print Timer - lStart

End Sub

From the Immediate Window:

400000 x 21
 8.68359375

About 9 seconds on my sad PC to split 400k rows into four sheets. I put 100k on each sheet, but I could have put more. Even if you work in 100k row increments, you can still put them on the same sheet. Instead of "A1" in my code, you need to write your chunk to the next cell and keep track of where the next cell is. Then when the next cell is > 10^6 rows, you create a new sheet and start over.

In summary, get the data into the biggest two dimensional array that you can and write it to the worksheet at one time. The fewer the writes, the faster the code.

Your description of progressively slower writes makes me suspiscious that your are running into an O(n^2) problem when using the Collection's index.

So try this: rather than indexing over the collection as you do now:

For lonSheetOneCounter = 2 to 999999
    outputWorkbook.Worksheets(1).Range(_
        outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 1).Address & ":" & _
        outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 21).Address).Value = _
        outputCollection.item(lonSheetOneCounter - 1)
Next lonSheetOneCounter

Try enumerating it instead:

lonSheetOneCounter = 2
For each item In outputCollection
    outputWorkbook.Worksheets(1).Range(_
        outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 1).Address & ":" & _
        outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 21).Address).Value = _
        item
    lonSheetOneCounter = lonSheetOneCounter + 1
Next

And you know, given that this is VBA and you're executing the loop body a million times, it wouldn't hurt to localize your references and use direct range specs instead of strings:

lonSheetOneCounter = 2
Dim ws As Worksheet
Set ws = outputWorkbook.Worksheets(1)
For each item In outputCollection
    ws.Range( _
        ws.Cells(lonSheetOneCounter, 1), ws.Cells(lonSheetOneCounter, 21)
            ).Value = item
    lonSheetOneCounter = lonSheetOneCounter + 1
Next

First of all, I think that you are using the wrong set of tools. VBA cannot handle very well large amounts of data, and write value by value is very slow.

A best suited method is to use Recordsets to get the data from the files and dump it on the template

I am assuming that:

  • Your files are in the same folder as the workbook that contains the code
  • There is a file called master.csv (pipe delimited) and files named ira.csv and non_ira.csv, both comma delimited

With that said, your routine can be divided in three steps:

  1. Create a schema.ini file
  2. Get the data from the files using the ActiveX Data Objects library
  3. Dump the data to the Workbook

Step 1: The schema.ini file

This step is required because your files don't have the same delimiter. this step just requires that you create a file named schema.ini in the same folder as your data and paste the code bellow:

[master.csv]
DecimalSymbol=.
Format=Delimited(|)
ColNameHeader=True

[ira.csv]
DecimalSymbol=. 
Format=Delimited(,)
ColNameHeader=True

[non_ira.csv]
DecimalSymbol=. 
Format=Delimited(,)
ColNameHeader=True

This file can be used to specify various atributes for your data. For more info see this link

Step 2: Get the data from the files using the ActiveX Data Objects library

First, you need to add a reference to the ActiveX Data Objects library. To do so, open the VBA Editor and then go to Tools > References and check Microsoft ActiveX Data Objects library . This step is required to use SQL queries on your data.

Next, you have to write code to configure a connection to your data, like this:

Private Function CreateConnection(folderPath As String) As ADODB.Connection

    Dim conStr As String

    conStr = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
             "Data Source=" & Replace(folderPath, "\", "\\") & ";" & _
             "Extended Properties=""text;HDR=Yes;IMEX=1;FMT=Delimited"";"

    Set CreateConnection = New ADODB.Connection
    CreateConnection.Open conStr

End Function

Then you can write a function to create a recordset based on a custom SQL query, like this:

Private Function GetData(cnn As ADODB.Connection, file As String) As ADODB.Recordset

    Dim strSql As String

    Const adOpenStatic = 3
    Const adLockOptimistic = 3
    Const adCmdText = &H1
    'You'll need to change this variable to match your needs
    strSql = "SELECT master.[Account Number], " & _
                   " master.[ID Number], " & _
                   " file.[Int Rate], " & _
                   " file.[Cum Int] " & _
              "FROM [master.csv] master INNER JOIN [" & file & ".csv] file ON master.[ID Number] = file.[ID Number]"
   Set GetData = New Recordset
   GetData.Open strSql, cnn, adOpenStatic, adLockOptimistic, adCmdText

End Function

This function will return a recordset with the data that is common to master and file , using ID Number as key

Step 3: Dump the data to the Workbook

To do so, you can write something like this:

Public Sub LoadData()
    Dim cnn As ADODB.Connection
    Dim rsIRA As ADODB.Recordset, rsNonIRA As ADODB.Recordset
    Dim wbk As Workbook

    Application.Calculation = xlCalculationManual
    Application.ScreenUpdating = False

    'In this example the files and this workbook are in the same folder
    Set cnn = CreateConnection(ThisWorkbook.Path & "\")

    Set rsIRA = GetData(cnn, "ira")
    Set rsNonIRA = GetData(cnn, "non_ira")

    Set wbk = Workbooks.Open("S:\Some Directory\Template.xlsx")

    'Dumps the data from the recordset
    wbk.Worksheets(1).Range("A2").CopyFromRecordset rsIRA
    wbk.Worksheets(1).Range("A2").Offset(rsIRA.RecordCount, 0).CopyFromRecordset rsNonIRA

    Application.ScreenUpdating = True

    'Clean up
    rsIRA.Close
    rsNonIRA.Close
    cnn.Close
    Set rsIRA = Nothing
    Set rsNonIRA = Nothing
    Set cnn = Nothing

End Sub

I tested with the data sample that you provided, and it worked. You'll have to adapt the code for your needs I think that it will run faster, since the it deals only with the DB/Excel API, eliminating the VBA bottleneck

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM