简体   繁体   English

匹配字符串数组中的值

[英]Matching values in string array

Problem: Looking for a more efficient way of finding whether there is an exact matching value in a 1d array -- essentially a boolean true/false . 问题:寻找一种更有效的方法来查找1d数组中是否存在精确匹配值 - 实质上是布尔值true/false

Am I overlooking something obvious? 我忽略了一些明显的东西吗 Or am I simply using the wrong data structure, by using an array when I probably should be using a collection object or a dictionary? 或者我只是使用错误的数据结构,当我可能应该使用集合对象或字典时使用数组? In the latter I could check the .Contains or .Exists method, respectively 在后者中,我可以分别检查.Contains.Exists方法

In Excel I can check for a value in a vector array like: 在Excel中,我可以检查向量数组中的值,如:

If Not IsError(Application.Match(strSearch, varToSearch, False)) Then
' Do stuff
End If

This returns an exact match index, obviously subject to limitations of Match function which only finds the first matching value in this context. 这将返回一个完全匹配的索引,显然受Match函数的限制,该函数只能在此上下文中找到第一个匹配值。 This is a commonly used method, and one that I have been using for a long time, too. 这是一种常用的方法,也是我长期使用的方法。

This is satisfactory enough for Excel -- but what about other applications? 这对Excel来说足够令人满意 - 但其他应用程序呢?

In other applications, I can do basically the same thing but requires enabling reference to the Excel object library, and then: 在其他应用程序中,我基本上可以做同样的事情,但需要启用Excel对象库的引用,然后:

   If Not IsError(Excel.Application.match(...))

That seems silly, though, and is difficult to manage on distributed files because of permissions/trust center/etc. 但这看起来很愚蠢,并且由于权限/信任中心/等原因而难以管理分布式文件。

I have tried to use the Filter() function: 我试过使用Filter()函数:

 If Not Ubound(Filter(varToSearch, strSearch)) = -1 Then
    'do stuff
 End If

But the problem with this approach is that Filter returns an array of partial matches, rather than an array of exact matches. 但是这种方法的问题是Filter返回部分匹配的数组,而不是完全匹配的数组。 (I have no idea why it would be useful to return substring/partial matches.) (我不知道为什么返回子串/部分匹配会有用。)

The other alternative is to literally iterate over each value in the array (this also is very commonly used I think) -- which seems even more needlessly cumbersome than calling on Excel's Match function. 另一种方法是逐字迭代数组中的每个值(我认为这也是非常常用的) - 这似乎比调用Excel的Match函数更加麻烦。

For each v in vArray
   If v = strSearch Then
    ' do stuff
   End If
Next

If we're going to talk about performance then there's no substutute for running some tests. 如果我们要讨论性能,那么运行一些测试就没有任何代价。 In my experience Application.Match() is up to ten times slower than calling a function which uses a loop. 根据我的经验,Application.Match()比调用使用循环的函数慢十倍。

Sub Tester()

    Dim i As Long, b, t
    Dim arr(1 To 100) As String

    For i = 1 To 100
        arr(i) = "Value_" & i
    Next i

    t = Timer
    For i = 1 To 100000
        b = Contains(arr, "Value_50")
    Next i
    Debug.Print "Contains", Timer - t

    t = Timer
    For i = 1 To 100000
        b = Application.Match(arr, "Value_50", False)
    Next i
    Debug.Print "Match", Timer - t

End Sub


Function Contains(arr, v) As Boolean
Dim rv As Boolean, lb As Long, ub As Long, i As Long
    lb = LBound(arr)
    ub = UBound(arr)
    For i = lb To ub
        If arr(i) = v Then
            rv = True
            Exit For
        End If
    Next i
    Contains = rv
End Function

Output: 输出:

Contains       0.8710938 
Match          4.210938 

I used to look for a best replace solution. 我曾经寻找最好的替代解决方案。 It should work for simple finding as well. 它也适用于简单的查找。

To find first instance of a string you can try using this code: 要查找字符串的第一个实例,您可以尝试使用此代码:

Sub find_strings_1()

Dim ArrayCh() As Variant
Dim rng As Range
Dim i As Integer

 ArrayCh = Array("a", "b", "c")

With ActiveSheet.Cells
    For i = LBound(ArrayCh) To UBound(ArrayCh)
        Set rng = .Find(What:=ArrayCh(i), _
        LookAt:=xlPart, _
        SearchOrder:=xlByColumns, _
        MatchCase:=False)

        Debug.Print rng.Address

    Next i
End With

End Sub

If you want to find all instances try the below. 如果要查找所有实例,请尝试以下操作。

Sub find_strings_2()

Dim ArrayCh() As Variant
Dim c As Range
Dim firstAddress As String
Dim i As Integer

 ArrayCh = Array("a", "b", "c") 'strings to lookup

With ActiveSheet.Cells
    For i = LBound(ArrayCh) To UBound(ArrayCh)
        Set c = .Find(What:=ArrayCh(i), LookAt:=xlPart, LookIn:=xlValues)

        If Not c Is Nothing Then
            firstAddress = c.Address 'used later to verify if looping over the same address
            Do
                '_____
                'your code, where you do something with "c"
                'which is a range variable,
                'so you can for example get it's address:
                Debug.Print ArrayCh(i) & " " & c.Address 'example
                '_____
                Set c = .FindNext(c)

            Loop While Not c Is Nothing And c.Address <> firstAddress
        End If
    Next i
End With

End Sub

Keep in mind that if there are several instances of searched string within one cell it will return only one result due to the specific of FindNext. 请记住,如果在一个单元格中有多个搜索字符串实例,则由于FindNext的特定,它将只返回一个结果。

Still, if you need a code for replacing found values with another, I'd use the first solution, but you'd have to change it a bit. 尽管如此,如果你需要一个代码来替换另一个找到的值,我会使用第一个解决方案,但你必须稍微改变一下。

"A more efficient way (compared to Application.Match )of finding whether a string value exists in an array": “一种更有效的方法(与Application.Match相比)查找数组中是否存在字符串值”:

I believe there is no more efficient way than the one you are using, ie, Application.Match . 我相信没有比你正在使用的方法更有效的方法,即Application.Match

Arrays allow efficient access in any element if we know the index of that element. 如果我们知道该元素的索引,则数组允许在任何元素中进行高效访问。 If we want to do anything by element value (even checking if an element exists), we have to scan all the elements of the array in the worst case. 如果我们想通过元素值做任何事情(甚至检查元素是否存在),我们必须在最坏的情况下扫描数组的所有元素。 Therefore, the worst case needs n element comparisons, where n is the size of the array. 因此,最坏的情况需要n元素比较,其中n是数组的大小。 So the maximum time we need to find if an element exists is linear in the size of the input, ie, O(n) . 因此,我们需要查找元素是否存在的最大时间是输入大小的线性,即O(n) This applies to any language that uses conventional arrays. 这适用于使用传统阵列的任何语言。

The only case where we can be more efficient, is when the array has special structure. 我们可以更高效的唯一情况是阵列具有特殊结构。 For your example, if the elements of the array are sorted (eg alphabetically), then we do not need to scan all the array: we compare with the middle element, and then compare with the left or right part of the array ( binary search) . 对于您的示例,如果数组的元素是排序的(例如按字母顺序排列),那么我们不需要扫描所有数组:我们与中间元素进行比较,然后与数组的左侧或右侧部分进行比较( 二进制搜索) ) But without assuming any special structure, there is no hope.. 但没有假设任何特殊结构,没有希望..

The Dictionary/Collection as you point, offers constant key access to their elements ( O(1) ). 您指向的Dictionary/Collection提供对其元素的常量键访问( O(1) )。 What perhaps is not very well documented is that one can also have index access to the dictionary elements (Keys and Items): the order in which elements are entered into the Dictionary is preserved. 可能还没有很好地记录的是,人们还可以对字典元素(键和项)进行索引访问 :保留元素输入Dictionary的顺序。 Their main disadvantage is that they use more memory as two objects are stored for each element. 它们的主要缺点是它们使用更多内存,因为每个元素都存储了两个对象。

To wrap up, although If Not IsError(Excel.Application.match(...)) looks silly, it is still the more efficient way (at least in theory). 总结一下,虽然If Not IsError(Excel.Application.match(...))看起来很愚蠢,但它仍然是更有效的方法(至少在理论上)。 On permission issues my knowledge is very limited. 在许可问题上,我的知识非常有限。 Depending on the host application, there are always some Find -type functions ( C++ has find and find_if for example). 根据主机应用程序的不同,总会有一些Find -type函数(例如C++findfind_if )。

I hope that helps! 我希望有所帮助!

Edit 编辑

I would like to add a couple of thoughts, after reading the amended version of the post and Tim's answer. 在阅读修改后的帖子和Tim的回答之后,我想补充一些想法。 The above text is focusing on the theoretical time complexity of the various data structures and ignores implementation issues. 上述文本侧重于各种数据结构的理论时间复杂性,忽略了实现问题。 I think the spirit of the question was rather, "given a certain data structure (array)", what is the most efficient way in practice of checking existence. 我认为问题的精神是相当“给予一定的数据结构(阵列)”,什么检查存在的实践中最有效的方式。

To this end, Tim's answer is an eye-opener. 为此,蒂姆的回答令人大开眼界。

The conventional rule "if VBA can do it for you then don't write it again yourself" is not always true. 传统规则“如果VBA可以为你做,那么不要自己再写”并不总是如此。 Simple operations like looping and comparisons can be faster that "agreegate" VBA functions. 循环和比较等简单操作可以更快“同意” VBA功能。 Two interesting links are here and here . 这里这里有两个有趣的链接。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM