简体   繁体   中英

Matching values in string array

Problem: Looking for a more efficient way of finding whether there is an exact matching value in a 1d array -- essentially a boolean true/false .

Am I overlooking something obvious? Or am I simply using the wrong data structure, by using an array when I probably should be using a collection object or a dictionary? In the latter I could check the .Contains or .Exists method, respectively

In Excel I can check for a value in a vector array like:

If Not IsError(Application.Match(strSearch, varToSearch, False)) Then
' Do stuff
End If

This returns an exact match index, obviously subject to limitations of Match function which only finds the first matching value in this context. This is a commonly used method, and one that I have been using for a long time, too.

This is satisfactory enough for Excel -- but what about other applications?

In other applications, I can do basically the same thing but requires enabling reference to the Excel object library, and then:

   If Not IsError(Excel.Application.match(...))

That seems silly, though, and is difficult to manage on distributed files because of permissions/trust center/etc.

I have tried to use the Filter() function:

 If Not Ubound(Filter(varToSearch, strSearch)) = -1 Then
    'do stuff
 End If

But the problem with this approach is that Filter returns an array of partial matches, rather than an array of exact matches. (I have no idea why it would be useful to return substring/partial matches.)

The other alternative is to literally iterate over each value in the array (this also is very commonly used I think) -- which seems even more needlessly cumbersome than calling on Excel's Match function.

For each v in vArray
   If v = strSearch Then
    ' do stuff
   End If
Next

If we're going to talk about performance then there's no substutute for running some tests. In my experience Application.Match() is up to ten times slower than calling a function which uses a loop.

Sub Tester()

    Dim i As Long, b, t
    Dim arr(1 To 100) As String

    For i = 1 To 100
        arr(i) = "Value_" & i
    Next i

    t = Timer
    For i = 1 To 100000
        b = Contains(arr, "Value_50")
    Next i
    Debug.Print "Contains", Timer - t

    t = Timer
    For i = 1 To 100000
        b = Application.Match(arr, "Value_50", False)
    Next i
    Debug.Print "Match", Timer - t

End Sub


Function Contains(arr, v) As Boolean
Dim rv As Boolean, lb As Long, ub As Long, i As Long
    lb = LBound(arr)
    ub = UBound(arr)
    For i = lb To ub
        If arr(i) = v Then
            rv = True
            Exit For
        End If
    Next i
    Contains = rv
End Function

Output:

Contains       0.8710938 
Match          4.210938 

I used to look for a best replace solution. It should work for simple finding as well.

To find first instance of a string you can try using this code:

Sub find_strings_1()

Dim ArrayCh() As Variant
Dim rng As Range
Dim i As Integer

 ArrayCh = Array("a", "b", "c")

With ActiveSheet.Cells
    For i = LBound(ArrayCh) To UBound(ArrayCh)
        Set rng = .Find(What:=ArrayCh(i), _
        LookAt:=xlPart, _
        SearchOrder:=xlByColumns, _
        MatchCase:=False)

        Debug.Print rng.Address

    Next i
End With

End Sub

If you want to find all instances try the below.

Sub find_strings_2()

Dim ArrayCh() As Variant
Dim c As Range
Dim firstAddress As String
Dim i As Integer

 ArrayCh = Array("a", "b", "c") 'strings to lookup

With ActiveSheet.Cells
    For i = LBound(ArrayCh) To UBound(ArrayCh)
        Set c = .Find(What:=ArrayCh(i), LookAt:=xlPart, LookIn:=xlValues)

        If Not c Is Nothing Then
            firstAddress = c.Address 'used later to verify if looping over the same address
            Do
                '_____
                'your code, where you do something with "c"
                'which is a range variable,
                'so you can for example get it's address:
                Debug.Print ArrayCh(i) & " " & c.Address 'example
                '_____
                Set c = .FindNext(c)

            Loop While Not c Is Nothing And c.Address <> firstAddress
        End If
    Next i
End With

End Sub

Keep in mind that if there are several instances of searched string within one cell it will return only one result due to the specific of FindNext.

Still, if you need a code for replacing found values with another, I'd use the first solution, but you'd have to change it a bit.

"A more efficient way (compared to Application.Match )of finding whether a string value exists in an array":

I believe there is no more efficient way than the one you are using, ie, Application.Match .

Arrays allow efficient access in any element if we know the index of that element. If we want to do anything by element value (even checking if an element exists), we have to scan all the elements of the array in the worst case. Therefore, the worst case needs n element comparisons, where n is the size of the array. So the maximum time we need to find if an element exists is linear in the size of the input, ie, O(n) . This applies to any language that uses conventional arrays.

The only case where we can be more efficient, is when the array has special structure. For your example, if the elements of the array are sorted (eg alphabetically), then we do not need to scan all the array: we compare with the middle element, and then compare with the left or right part of the array ( binary search) . But without assuming any special structure, there is no hope..

The Dictionary/Collection as you point, offers constant key access to their elements ( O(1) ). What perhaps is not very well documented is that one can also have index access to the dictionary elements (Keys and Items): the order in which elements are entered into the Dictionary is preserved. Their main disadvantage is that they use more memory as two objects are stored for each element.

To wrap up, although If Not IsError(Excel.Application.match(...)) looks silly, it is still the more efficient way (at least in theory). On permission issues my knowledge is very limited. Depending on the host application, there are always some Find -type functions ( C++ has find and find_if for example).

I hope that helps!

Edit

I would like to add a couple of thoughts, after reading the amended version of the post and Tim's answer. The above text is focusing on the theoretical time complexity of the various data structures and ignores implementation issues. I think the spirit of the question was rather, "given a certain data structure (array)", what is the most efficient way in practice of checking existence.

To this end, Tim's answer is an eye-opener.

The conventional rule "if VBA can do it for you then don't write it again yourself" is not always true. Simple operations like looping and comparisons can be faster that "agreegate" VBA functions. Two interesting links are here and here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM