Just wondering if anyone can offer any suggestions that might improve the speed that my code writes arrays to a workbook.
I'm writing about 1.9 million rows of data to several sheets in a workbook, one sheet at a time. While the code completes, it takes about 18 hours to write to the excel workbook, which seems ridiculously excessive. Here is the setup. I open the workbook as such:
Dim ExcelAp As Excel.Application
Dim ouputWorkbook As Excel.Workbook
Set ExcelAp = New Excel.Application
Set outputWorkbook = ExcelAp.Workbooks.Open("S:\Some Directory\Template.xlsx")
Then I have the rows of the workbook in arrays loaded into a collection and I loop through the ranges in the workbook to copy the arrays:
For lonSheetOneCounter = 2 to 999999
outputWorkbook.Worksheets(1).Range(_
outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 1).Address & ":" & _
outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 21).Address).Value = _
outputCollection.item(lonSheetOneCounter - 1)
Next lonSheetOneCounter
The copying method is the same for the other sheets. I have made the workbook and instance of excel invisible, I have switched calculation to manual for that workbook, and I have also turned off screen updating, but it still takes about 18 hours give or take to complete copying over to the new workbook.
I've tried making a 2 dimensional array for the entier sheet, but no matter the method I use to do that I get an "out of memory error" the moment I attempt to copy that array to the workbook.
I'm not sure if there is anything else I can do to get by the error and reduce the time to copy, but if anyone has a suggestion, I'm all ears. For what it is worth, this macro is housed in another excel workbook running in a seperate instance of excel from the workbook I'm attempting to copy to.
Edit: Slight addition here. Something that I noticed that I wanted to draw attention to that also makes me think it may be possible to speed up the process. I've noticed that the macro slows down progressively. The first X number of rows write very fast, the following rows seem to slow down more and more as each row is written...
I'm going to try an experiment where I set up my template to automatically load a spreadsheet with 1 million used rows... sort of prompted by the suggestion at the bottom. I'm wondering if excel is having to alocate memory for all the extra rows. Perhaps if I start out with a workbook template that already has that number of rows set up, I might have an easier go of it.
Edit: It was pointed out to me that I wasn't clear about where the data I'm reading in is coming from. This data is read in using the VBA primitives from a number of text files. One is pipe delimited, the other two comma, not that the scheme of the files makes much difference.
As far as populating the array, here is a snippet of how that happens. It looks a mess but there simply isn't any other way to get the data to match up given the format of the three files I'm comparing. Anyway, now that I'm placing everything into large, large arrays this is how I'm populating those arrays. The references to arrViLine and arrNonIraLine and arrIraLine are simply the arrays that the lines of the file are parsed into from their original pipe and comma delimited formats:
If arrViLine(2) = arrIraLine(1) Or arrViLine(2) = arrNonIraLine(1) Then
If arrViLine(2) = arrIraLine(1) Then
boolVi = True
boolIra = True
boolNonIra = False
If lonMatchCounter <= 999999 Then
matchOneArray(lonMatchCounter, 1) = arrViLine(1)
matchOneArray(lonMatchCounter, 2) = arrViLine(2)
matchOneArray(lonMatchCounter, 3) = arrIraLine(2)
matchOneArray(lonMatchCounter, 4) = arrIraLine(3)
matchOneArray(lonMatchCounter, 5) = arrViLine(3)
matchOneArray(lonMatchCounter, 6) = arrViLine(4)
matchOneArray(lonMatchCounter, 7) = arrIraLine(4)
matchOneArray(lonMatchCounter, 8) = arrViLine(6)
matchOneArray(lonMatchCounter, 9) = arrViLine(5)
matchOneArray(lonMatchCounter, 10) = arrViLine(7)
matchOneArray(lonMatchCounter, 11) = arrViLine(8)
matchOneArray(lonMatchCounter, 12) = arrViLine(9)
matchOneArray(lonMatchCounter, 13) = arrViLine(10)
matchOneArray(lonMatchCounter, 14) = arrViLine(11)
matchOneArray(lonMatchCounter, 15) = arrViLine(12)
matchOneArray(lonMatchCounter, 16) = arrIraLine(5)
matchOneArray(lonMatchCounter, 17) = arrIraLine(6)
matchOneArray(lonMatchCounter, 18) = arrViLine(13)
matchOneArray(lonMatchCounter, 19) = arrViLine(14)
matchOneArray(lonMatchCounter, 20) = "IRA"
matchOneArray(lonMatchCounter, 21) = arrViLine(15)
lonMatchCounter = lonMatchCounter + 1
Else
lonMatchTwoCounter = lonMatchCounter - 999999
matchTwoArray(lonMatchTwoCounter, 1) = arrViLine(1)
matchTwoArray(lonMatchTwoCounter, 2) = arrViLine(2)
matchTwoArray(lonMatchTwoCounter, 3) = arrIraLine(2)
matchTwoArray(lonMatchTwoCounter, 4) = arrIraLine(3)
matchTwoArray(lonMatchTwoCounter, 5) = arrViLine(3)
matchTwoArray(lonMatchTwoCounter, 6) = arrViLine(4)
matchTwoArray(lonMatchTwoCounter, 7) = arrIraLine(4)
matchTwoArray(lonMatchTwoCounter, 8) = arrViLine(6)
matchTwoArray(lonMatchTwoCounter, 9) = arrViLine(5)
matchTwoArray(lonMatchTwoCounter, 10) = arrViLine(7)
matchTwoArray(lonMatchTwoCounter, 11) = arrViLine(8)
matchTwoArray(lonMatchTwoCounter, 12) = arrViLine(9)
matchTwoArray(lonMatchTwoCounter, 13) = arrViLine(10)
matchTwoArray(lonMatchTwoCounter, 14) = arrViLine(11)
matchTwoArray(lonMatchTwoCounter, 15) = arrViLine(12)
matchTwoArray(lonMatchTwoCounter, 16) = arrIraLine(5)
matchTwoArray(lonMatchTwoCounter, 17) = arrIraLine(6)
matchTwoArray(lonMatchTwoCounter, 18) = arrViLine(13)
matchTwoArray(lonMatchTwoCounter, 19) = arrViLine(14)
matchTwoArray(lonMatchTwoCounter, 20) = "IRA"
matchTwoArray(lonMatchTwoCounter, 21) = arrViLine(15)
lonMatchCounter = lonMatchCounter + 1
End If
Else 'arrViLine(2) must = arrNonIraLine(1)
boolVi = True
boolIra = False
boolNonIra = True
If lonMatchCounter <= 999999 Then
matchOneArray(lonMatchCounter, 1) = arrViLine(1)
matchOneArray(lonMatchCounter, 2) = arrViLine(2)
matchOneArray(lonMatchCounter, 3) = arrNonIraLine(2)
matchOneArray(lonMatchCounter, 4) = arrNonIraLine(3)
matchOneArray(lonMatchCounter, 5) = arrViLine(3)
matchOneArray(lonMatchCounter, 6) = arrViLine(4)
matchOneArray(lonMatchCounter, 7) = arrNonIraLine(5)
matchOneArray(lonMatchCounter, 8) = arrViLine(6)
matchOneArray(lonMatchCounter, 9) = arrViLine(5)
matchOneArray(lonMatchCounter, 10) = arrViLine(7)
matchOneArray(lonMatchCounter, 11) = arrViLine(8)
matchOneArray(lonMatchCounter, 12) = arrViLine(9)
matchOneArray(lonMatchCounter, 13) = arrViLine(10)
matchOneArray(lonMatchCounter, 14) = arrViLine(11)
matchOneArray(lonMatchCounter, 15) = arrViLine(12)
matchOneArray(lonMatchCounter, 16) = arrNonIraLine(4)
matchOneArray(lonMatchCounter, 17) = arrNonIraLine(6)
matchOneArray(lonMatchCounter, 18) = arrViLine(13)
matchOneArray(lonMatchCounter, 19) = arrViLine(14)
matchOneArray(lonMatchCounter, 20) = "IRA"
matchOneArray(lonMatchCounter, 21) = arrViLine(15)
lonMatchCounter = lonMatchCounter + 1
Else
lonMatchTwoCounter = lonMatchCounter - 999999
matchTwoArray(lonMatchTwoCounter, 1) = arrViLine(1)
matchTwoArray(lonMatchTwoCounter, 2) = arrViLine(2)
matchTwoArray(lonMatchTwoCounter, 3) = arrNonIraLine(2)
matchTwoArray(lonMatchTwoCounter, 4) = arrNonIraLine(3)
matchTwoArray(lonMatchTwoCounter, 5) = arrViLine(3)
matchTwoArray(lonMatchTwoCounter, 6) = arrViLine(4)
matchTwoArray(lonMatchTwoCounter, 7) = arrNonIraLine(5)
matchTwoArray(lonMatchTwoCounter, 8) = arrViLine(6)
matchTwoArray(lonMatchTwoCounter, 9) = arrViLine(5)
matchTwoArray(lonMatchTwoCounter, 10) = arrViLine(7)
matchTwoArray(lonMatchTwoCounter, 11) = arrViLine(8)
matchTwoArray(lonMatchTwoCounter, 12) = arrViLine(9)
matchTwoArray(lonMatchTwoCounter, 13) = arrViLine(10)
matchTwoArray(lonMatchTwoCounter, 14) = arrViLine(11)
matchTwoArray(lonMatchTwoCounter, 15) = arrViLine(12)
matchTwoArray(lonMatchTwoCounter, 16) = arrNonIraLine(4)
matchTwoArray(lonMatchTwoCounter, 17) = arrNonIraLine(6)
matchTwoArray(lonMatchTwoCounter, 18) = arrViLine(13)
matchTwoArray(lonMatchTwoCounter, 19) = arrViLine(14)
matchTwoArray(lonMatchTwoCounter, 20) = "Non-IRA"
matchTwoArray(lonMatchTwoCounter, 21) = arrViLine(15)
lonMatchCounter = lonMatchCounter + 1
End If
End If
You can also ignore the boolean variables, they are there to cue the macro as to whether or not the next line of a particular file should be read on the next loop.
EDIT: Not that it has much bearing on how fast I write the data to excel, consider the folowing lines to be an example of the format of the files I'm working with.
"Master" file:
Account Number|ID Number|Int Rate|Cum Int|Agreement|Type
12345|111111|.005|.01234|"C"|"IRA"
12346|111112|.005|.02345|"A"|"Non-IRA"
12347|111113|.004|.02345|"B"|"Non-IRA"
Match File One:
ID Number|Int Rate|Cum Int|Type
111111|.004|.01234|"IRA"
Match File Two:
ID Number|Int Rate|Cum Int|Type
111113|.004|.02345|"Non-IRA"
So that is just a little example of what I'm working with. Text files and CSV files that are listed in sequential order by the ID number. In the example above, the macro would match the first line of the master to match file one and record the data from all of the fields from both files to an array which will be output to an excel spreadsheet. The macro then reads in the next line of the master file and match file one, but carries over the line from file two to the next loop. The next line of the master would have no match and be recorded on a seperate sheet of the workbook. The last line of the master matches match file two and is recorded to the same array as the first match.
That's how the routine works, still, the real issue I'm having is the speed at which the data is written to the excel workbook. I'm currently working on carving the data into columns.
You don't need collections: just assign the data from a worksheet into a single variant and then assign the variant back to the new sheet.
To minimise memory etc try using the UsedRange on the worksheet. This example copies a column at a time: it takes 35 seconds to copy 1 million rows by 21 columns from 1 worksheet to another worksheet using Excel 2010 32-bit
Sub getting()
Dim var As Variant
Dim j As Long
Dim dTime As Double
dTime = Now
For j = 1 To 21
var = Worksheets("Sheet3").UsedRange.Resize(, 1).Offset(0, j - 1).Value2
Worksheets("Sheet1").Range("a1").Resize(UBound(var), UBound(var, 2)).Offset(0, j - 1) = var
Next j
MsgBox CStr(Now - dTime)
End Sub
I tried to test this will half a million rows going into an array, but got an out of memory error. You don't say how you're filling your collection/arrays, but I assume you're able to do it. I ended up with 400k x 21 array for demonstration purposes.
The part that's taking all the time is that you're writing to the sheet 21 cells at a time. Writing to the sheet is the most time intensive thing you can do in Excel VBA, so you need to minimize that operation as much as possible.
For this proof of concept, I read 400k x 21 pieces of data. The I write them out in 100k row increments to different sheets. For your purposes, you should make the biggest chunk array that your memory can handle.
Sub WriteDataToFiles()
Dim vaData As Variant
Dim vaChunk() As Variant
Dim lStep As Long
Dim i As Long, j As Long, k As Long
Dim wb As Workbook, sh As Worksheet
Dim lStart As Long
lStart = Timer
'Process in 100,000 row increments
lStep = 10 ^ 5
'Fill a big array with a bunch of data
FillDataArray vaData
'Show how big the array is
Debug.Print UBound(vaData, 1) & " x " & UBound(vaData, 2)
'Create a new workbook to write to
Set wb = Workbooks.Add
'loop through the big array in 100k increments
For i = LBound(vaData, 1) To UBound(vaData, 1) Step lStep
'dimension a smaller range to hold a subset of the big array
ReDim vaChunk(1 To lStep, 1 To 21) 'clean out array
'fill the smaller array with data from big array
For j = LBound(vaChunk) To UBound(vaChunk)
For k = 1 To 21
vaChunk(j, k) = vaData(i + j - 1, k)
Next k
Next j
'Add a new sheet
Set sh = wb.Worksheets.Add
'Write the small array to the sheet
sh.Range("A1").Resize(UBound(vaChunk, 1), UBound(vaChunk, 2)).Value = vaChunk
Next i
'See how long it takes
Debug.Print Timer - lStart
End Sub
From the Immediate Window:
400000 x 21
8.68359375
About 9 seconds on my sad PC to split 400k rows into four sheets. I put 100k on each sheet, but I could have put more. Even if you work in 100k row increments, you can still put them on the same sheet. Instead of "A1" in my code, you need to write your chunk to the next cell and keep track of where the next cell is. Then when the next cell is > 10^6 rows, you create a new sheet and start over.
In summary, get the data into the biggest two dimensional array that you can and write it to the worksheet at one time. The fewer the writes, the faster the code.
Your description of progressively slower writes makes me suspiscious that your are running into an O(n^2) problem when using the Collection's index.
So try this: rather than indexing over the collection as you do now:
For lonSheetOneCounter = 2 to 999999
outputWorkbook.Worksheets(1).Range(_
outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 1).Address & ":" & _
outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 21).Address).Value = _
outputCollection.item(lonSheetOneCounter - 1)
Next lonSheetOneCounter
Try enumerating it instead:
lonSheetOneCounter = 2
For each item In outputCollection
outputWorkbook.Worksheets(1).Range(_
outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 1).Address & ":" & _
outputWorkbook.Worksheets(1).Cells(lonSheetOneCounter, 21).Address).Value = _
item
lonSheetOneCounter = lonSheetOneCounter + 1
Next
And you know, given that this is VBA and you're executing the loop body a million times, it wouldn't hurt to localize your references and use direct range specs instead of strings:
lonSheetOneCounter = 2
Dim ws As Worksheet
Set ws = outputWorkbook.Worksheets(1)
For each item In outputCollection
ws.Range( _
ws.Cells(lonSheetOneCounter, 1), ws.Cells(lonSheetOneCounter, 21)
).Value = item
lonSheetOneCounter = lonSheetOneCounter + 1
Next
First of all, I think that you are using the wrong set of tools. VBA cannot handle very well large amounts of data, and write value by value is very slow.
A best suited method is to use Recordsets to get the data from the files and dump it on the template
I am assuming that:
With that said, your routine can be divided in three steps:
Step 1: The schema.ini
file
This step is required because your files don't have the same delimiter. this step just requires that you create a file named schema.ini
in the same folder as your data and paste the code bellow:
[master.csv]
DecimalSymbol=.
Format=Delimited(|)
ColNameHeader=True
[ira.csv]
DecimalSymbol=.
Format=Delimited(,)
ColNameHeader=True
[non_ira.csv]
DecimalSymbol=.
Format=Delimited(,)
ColNameHeader=True
This file can be used to specify various atributes for your data. For more info see this link
Step 2: Get the data from the files using the ActiveX Data Objects library
First, you need to add a reference to the ActiveX Data Objects library. To do so, open the VBA Editor and then go to Tools
> References
and check Microsoft ActiveX Data Objects library
. This step is required to use SQL queries on your data.
Next, you have to write code to configure a connection to your data, like this:
Private Function CreateConnection(folderPath As String) As ADODB.Connection
Dim conStr As String
conStr = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & Replace(folderPath, "\", "\\") & ";" & _
"Extended Properties=""text;HDR=Yes;IMEX=1;FMT=Delimited"";"
Set CreateConnection = New ADODB.Connection
CreateConnection.Open conStr
End Function
Then you can write a function to create a recordset based on a custom SQL query, like this:
Private Function GetData(cnn As ADODB.Connection, file As String) As ADODB.Recordset
Dim strSql As String
Const adOpenStatic = 3
Const adLockOptimistic = 3
Const adCmdText = &H1
'You'll need to change this variable to match your needs
strSql = "SELECT master.[Account Number], " & _
" master.[ID Number], " & _
" file.[Int Rate], " & _
" file.[Cum Int] " & _
"FROM [master.csv] master INNER JOIN [" & file & ".csv] file ON master.[ID Number] = file.[ID Number]"
Set GetData = New Recordset
GetData.Open strSql, cnn, adOpenStatic, adLockOptimistic, adCmdText
End Function
This function will return a recordset with the data that is common to master
and file
, using ID Number
as key
Step 3: Dump the data to the Workbook
To do so, you can write something like this:
Public Sub LoadData()
Dim cnn As ADODB.Connection
Dim rsIRA As ADODB.Recordset, rsNonIRA As ADODB.Recordset
Dim wbk As Workbook
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
'In this example the files and this workbook are in the same folder
Set cnn = CreateConnection(ThisWorkbook.Path & "\")
Set rsIRA = GetData(cnn, "ira")
Set rsNonIRA = GetData(cnn, "non_ira")
Set wbk = Workbooks.Open("S:\Some Directory\Template.xlsx")
'Dumps the data from the recordset
wbk.Worksheets(1).Range("A2").CopyFromRecordset rsIRA
wbk.Worksheets(1).Range("A2").Offset(rsIRA.RecordCount, 0).CopyFromRecordset rsNonIRA
Application.ScreenUpdating = True
'Clean up
rsIRA.Close
rsNonIRA.Close
cnn.Close
Set rsIRA = Nothing
Set rsNonIRA = Nothing
Set cnn = Nothing
End Sub
I tested with the data sample that you provided, and it worked. You'll have to adapt the code for your needs I think that it will run faster, since the it deals only with the DB/Excel API, eliminating the VBA bottleneck
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.