[英]VBA-Excel and large data sets causes program to crash

First time poster and new to programming in general. 第一次海报和一般的编程新手。 I have a project in which i have to build a financial model to mine for data in excel. 我有一个项目,我必须建立一个财务模型来挖掘excel中的数据。 I have succeeded in building said model on VBA. 我成功地在VBA上构建了这个模型。 I have ran tests on 3,000 line dataset and it was successful. 我已经对3,000行数据集进行了测试,并且成功了。 I will briefly explain what it does. 我将简要解释它的作用。

I track a given stock on a given day on multiple exchanges. 我在多个交易所跟踪某一天的特定股票。 I download the data (roughly 935,000 lines) The first step is copy all the data for a given exchange (roughly 290,000) onto a new sheet (this takes roughly 8min), then I create a new column to log the bid ask spread (12secs), the next step is what Im having trouble with, I basically rank each line of data twice, one column for Bid size and one column for Ask size. 我下载数据(大约935,000行)第一步是将给定交换的所有数据(大约290,000)复制到新工作表上(大约需要8分钟),然后我创建一个新列来记录出价请求差价(12secs) ),下一步是我遇到麻烦,我基本上将每行数据排名两次,一列为Bid尺寸,一列为Ask size。 I created a function which uses excel Percentile function and ranks based on where the given bid and ask size lands. 我创建了一个使用excel Percentile函数的函数,并根据给定的出价和要求大小的位置进行排名。 As of right now, I have been running the Macro for the last 35min and has yet to execute. 截至目前,我已经运行了最后35分钟的宏并且还没有执行。 I cant attempt the other macros since each macro depends on the previous one. 我不能尝试其他宏,因为每个宏取决于前一个宏。

So my basic issue is that since my data set is large, my model keeps crashing.The code seems to be fine when working with the test data, and it doesn't throw any errors when I run the program, but with the larger data set it just crashes. 所以我的基本问题是,由于我的数据集很大,我的模型不断崩溃。使用测试数据时代码似乎没问题,并且在运行程序时不会抛出任何错误,但数据量较大设置它只是崩溃。 Does anyone have any suggestions? 有没有人有什么建议? Is this normal with such large amounts of data? 这么大量的数据是正常的吗?

Thanks in advance. 提前致谢。 Sham

Here is the sub and function thats giving me the trouble, the sub takes in the required inputs to run the function and then pops into the assigned cell. 这是给我带来麻烦的子和函数,sub接受运行函数所需的输入,然后弹出到指定的单元格中。 The code is suppose to repeat the process for three separate sheets. 该代码假设重复三个单独的工作表的过程。 For now, Id like it to work on one sheet, hence used the comments to not include the loop 目前,我喜欢它在一张纸上工作,因此使用注释不包括循环

Sub Bucketting()

Dim firstRow As Long
Dim lastRow As Long
Dim counter As Long
Dim bidRange As Range
Dim offerRange As Range
Dim bidScroll As Range
Dim offerScroll As Range
Dim Ex As String
Dim i As Integer

'For i = 1 To 1 Step 1 'Sheet Selection Process
 '   If i = 1 Then
  '      Ex = "Z"
   ' ElseIf i = 2 Then
    '    Ex = "P"
   ' Else
    '    Ex = "T"
   ' End If

Sheets("Z").Select 'Sheet selected

With ActiveSheet

    firstRow = .UsedRange.Cells(1).Row + 1
    lastRow = .UsedRange.Rows.Count

   Set bidRange = .Range("F2:F" & lastRow)
   Set offerRange = .Range("G2:G" & lastRow)

    For counter = lastRow To firstRow Step -1

        Set bidScroll = .Range("F" & counter)
        Set offerScroll = .Range("G" & counter)

        With .Cells(counter, "J")
        .Value = DECILE_RANK(bidRange, bidScroll)
        End With

        With .Cells(counter, "K")
        .Value = DECILE_RANK(offerRange, offerScroll)
        End With

    Next counter

End With

ActiveCell = "Bid Rank"

ActiveCell.Offset(0, 1) = "Offer Rank"

'Next i

End Sub

 Function DECILE_RANK(DataRange, RefCell)

    'Credit: BJRaid 
    'DECILE_RANK(The Range of data)
    'Declares the function that can be called in the spreadsheet cell  - enter '=DECILE_RANK(A5:A50,A5)

    'Using the percentile worksheet function calculate where the 10th, 20th etc percentile of the reference range are

    DEC1 = Application.WorksheetFunction.Percentile(DataRange, 0.1)
    DEC2 = Application.WorksheetFunction.Percentile(DataRange, 0.2)
    DEC3 = Application.WorksheetFunction.Percentile(DataRange, 0.3)
    DEC4 = Application.WorksheetFunction.Percentile(DataRange, 0.4)
    DEC5 = Application.WorksheetFunction.Percentile(DataRange, 0.5)
    DEC6 = Application.WorksheetFunction.Percentile(DataRange, 0.6)
    DEC7 = Application.WorksheetFunction.Percentile(DataRange, 0.7)
    DEC8 = Application.WorksheetFunction.Percentile(DataRange, 0.8)
    DEC9 = Application.WorksheetFunction.Percentile(DataRange, 0.9)

    ' Calculate the Decile rank that the reference cell value sits within

    If (RefCell <= DEC1) Then DECILE_RANK = 1
    If (RefCell > DEC1) And (RefCell <= DEC2) Then DECILE_RANK = 2
    If (RefCell > DEC2) And (RefCell <= DEC3) Then DECILE_RANK = 3
    If (RefCell > DEC3) And (RefCell <= DEC4) Then DECILE_RANK = 4
    If (RefCell > DEC4) And (RefCell <= DEC5) Then DECILE_RANK = 5
    If (RefCell > DEC5) And (RefCell <= DEC6) Then DECILE_RANK = 6
    If (RefCell > DEC6) And (RefCell <= DEC7) Then DECILE_RANK = 7
    If (RefCell > DEC7) And (RefCell <= DEC8) Then DECILE_RANK = 8
    If (RefCell > DEC8) And (RefCell <= DEC9) Then DECILE_RANK = 9
    If (RefCell > DEC9) Then DECILE_RANK = 10

End Function

935,000 lines is a lot for excel. excel有935,000行很多。 Like, really a lot. 就像,真的很多。 Barring saying using a real database, If your application is literally putting a =Percentile(...) in each cell, I would recommend Trying to use another tool for that. 除非使用真实的数据库说,如果你的应用程序确实在每个单元格中放置了一个= Percentile(...),我建议尝试使用另一个工具。 Perhaps something within VBA itself. 也许是VBA内部的一些东西。 More generally, use something outside of a cell - then store the result value in the cell. 更一般地说,使用单元格之外的东西 - 然后将结果值存储在单元格中。 There is a lot of overhead in maintaining those formulas that are interdependent on 935k rows of data. 维护那些与935k行数据相互依赖的公式有很多开销。

The problem is that your looping through each row individually, the Excel way is to try and work with whole ranges at once whenever possible. 问题是你单独循环遍历每一行,Excel的方法是尽可能尝试使用整个范围。 I would load the ranges into arrays, then modify your DECILE_RANK code to work with the items in the array. 我会将范围加载到数组中,然后修改DECILE_RANK代码以使用数组中的项。

Note that variant arrays that read ranges in are 2-D. 请注意,读取范围的变体数组是2-D。

Here is the fully functioning code including my custom VBA array slicer. 这是功能齐全的代码,包括我的自定义VBA阵列切片器。 Note that it was only tested on a small dataset: 请注意,它仅在小型数据集上进行了测试:

Sub Bucketting()

Dim lastRow As Long
Dim bidArray As Variant
Dim offerArray As Variant

Sheets("Sheet1").Select 'Sheet selected

With ActiveSheet

  lastRow = .UsedRange.Rows.Count + 1

  bidArray = .Range("F2:F" & lastRow)
  offerArray = .Range("G2:G" & lastRow)

  Range("J2:J" & lastRow).Value = GetArraySlice2D(DECILE_RANK(bidArray), "column", 1, 1, 0)
  Range("K2:K" & lastRow).Value = GetArraySlice2D(DECILE_RANK(offerArray), "column", 1, 1, 0)

End With

ActiveCell = "Bid Rank"

ActiveCell.Offset(0, 1) = "Offer Rank"

End Sub

Function DECILE_RANK(DataRange As Variant) As Variant

' Credit:     BJRaid
' DECILE_RANK(The Range of data)
' Declares the function that can be called in the spreadsheet cell  - enter '=DECILE_RANK(A5:A50,A5)

Dim DEC(0 To 10) As Variant
Dim i As Integer, j As Integer

'Using the percentile worksheet function calculate where the 10th, 20th etc percentile of the reference range are
DEC(0) = 0
For i = 1 To 9
  DEC(i) = Application.WorksheetFunction.Percentile(DataRange, 0.1 * i)
Next i
DEC(10) = Application.WorksheetFunction.Max(DataRange)

' Calculate the Decile rank that the reference cell value sits within
For i = 1 To UBound(DataRange, 1)
  For j = 1 To 10
    If ((DataRange(i, 1) > DEC(j - 1)) And (DataRange(i, 1) <= DEC(j))) Then
      DataRange(i, 1) = j
      Exit For
    End If
  Next j
Next i


End Function

Public Function GetArraySlice2D(Sarray As Variant, Stype As String, Sindex As Integer, Sstart As Integer, Sfinish As Integer) As Variant

' this function returns a slice of an array, Stype is either row or column
' Sstart is beginning of slice, Sfinish is end of slice (Sfinish = 0 means entire
' row or column is taken), Sindex is the row or column to be sliced (NOTE:
' 1 is always the first row or first column)
' an Sindex value of 0 means that the array is one dimensional 3/20/09 Lance Roberts

Dim vtemp() As Variant
Dim i As Integer

On Err GoTo ErrHandler

Select Case Sindex
    Case 0
        If Sfinish - Sstart = UBound(Sarray) - LBound(Sarray) Then
            vtemp = Sarray
            ReDim vtemp(1 To Sfinish - Sstart + 1)
            For i = 1 To Sfinish - Sstart + 1
                vtemp(i) = Sarray(i + Sstart - 1)
            Next i
        End If
    Case Else
        Select Case Stype
            Case "row"
                If Sfinish = 0 Or (Sstart = LBound(Sarray, 2) And Sfinish = UBound(Sarray, 2)) Then
                    vtemp = Application.WorksheetFunction.Index(Sarray, Sindex, 0)
                    ReDim vtemp(1 To Sfinish - Sstart + 1)
                    For i = 1 To Sfinish - Sstart + 1
                        vtemp(i) = Sarray(Sindex, i + Sstart - 1)
                    Next i
                End If
            Case "column"
                If Sfinish = 0 Or (Sstart = LBound(Sarray, 1) And Sfinish = UBound(Sarray, 1)) Then
                    vtemp = Application.WorksheetFunction.Index(Sarray, 0, Sindex)
                    ReDim vtemp(1 To Sfinish - Sstart + 1)
                    For i = 1 To Sfinish - Sstart + 1
                        vtemp(i) = Sarray(i + Sstart - 1, Sindex)
                    Next i
                End If
        End Select
End Select
GetArraySlice2D = vtemp
Exit Function

    Dim M As Integer
    M = MsgBox("Bad Array Input", vbOKOnly, "GetArraySlice2D")

End Function

I'm not sure if this will directly address your problem, but have you considered using Application.ScreenUpdating = False ? 我不确定这是否会直接解决您的问题,但您是否考虑过使用Application.ScreenUpdating = False Don't forget to set it back to true once your data has processed. 处理完数据后,请不要忘记将其设置为true。

