简体   繁体   English

VB.net正则表达式的速度?

[英]Speed of VB.net Regex?

Is the regex code in VB.net known to be slow? 是VB.net中的正则表达式代码慢吗?

I took over some code that was cleaning large amounts of text data. 我接管了一些正在清除大量文本数据的代码。 The code ran fairly slow, so I was looking for some ways to speed it up. 代码运行得很慢,所以我正在寻找一些加快速度的方法。 I found a couple functions that got run a lot that I thought might be part of the problem. 我发现运行了很多函数,我认为这可能是问题的一部分。

Here's the original code for cleaning a phone number: 这是清理电话号码的原始代码:

        Dim strArray() As Char = strPhoneNum.ToCharArray
        Dim strNewPhone As String = ""
        Dim i As Integer

        For i = 0 To strArray.Length - 1
            If strArray.Length = 11 And strArray(0) = "1" And i = 0 Then
                Continue For
            End If

            If IsNumeric(strArray(i)) Then
                strNewPhone = strNewPhone & strArray(i)
            End If
        Next

        If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
            Return strNewPhone
        End If

I rewrote the code to eliminate the array and looping using regex. 我重写了代码,以消除使用正则表达式的数组和循环。

        Dim strNewPhone As String = ""
        strNewPhone = Regex.Replace(strPhoneNum, "\D", "")
        If strNewPhone = "" OrElse strNewPhone.Substring(0, 1) <> "1" Then
            Return strNewPhone
        Else
            strNewPhone = Mid(strNewPhone, 2)
        End If

        If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
            Return strNewPhone
        End If

After running a couple tests, the new code is significantly slower than the old. 运行几次测试后,新代码比旧代码要慢得多。 Is regex in VB.net slow, did I add some other thing that is the issue, or is the original code just fine the way it was? VB.net中的正则表达式是否运行缓慢,是否添加了其他一些问题,或者原始代码是否还算不错?

I conducted some tests with the Visual Studio Profiler and I did not get the same results you did. 我使用Visual Studio Profiler进行了一些测试,但没有得到与您相同的结果。 There was a logical error is your Regex function that caused the length check to be missed if the number didn't begin with 1 . 您的Regex函数存在一个逻辑错误,如果数字不是以1开头,则会导致长度检查丢失。 I corrected that in my tests. 我在测试中纠正了这个问题。

  1. I realized in my tests, that whatever function went first and last would suffer a penalty. 我在测试中意识到,无论第一个还是最后一个函数执行都会受到惩罚。 So I executed each function independently and had a priming function run before. 因此,我独立执行了每个函数,并在之前运行了启动函数。
  2. Depending on the tests I executed the function either 10000 or 100000 times with a phone number like pattern of varying length. 根据测试,我执行了10000或100000次功能,并使用了不同长度的电话号码。 Each method got the same numbers. 每种方法的编号相同。

Results 结果

In general my method was always slightly faster. 总的来说,我的方法总是稍快一些。

  1. I did a cheap timer test, the Original function was twice as slow. 我做了一个廉价的计时器测试,原始功能的速度是原来的两倍。
  2. Profiler showed the Original Method used about 60% more memory than our methods. Profiler显示原始方法使用的内存比我们的方法多60%。
  3. Profiler showed the Original Method took eight times as long to work. Profiler显示原始方法花费的时间是其的八倍。
  4. Profiler showed the Original Method took about 40% more processor cycles. Profiler显示原始方法花费了大约40%的处理器周期。

My Conclusion 我的结论

In all tests the Original method was much slower. 在所有测试中,原始方法都慢得多。 Had it come out better in one test then I be able to explain our discrepancy. 如果它在一项测试中表现更好,那么我就能解释我们的差异。 Ff you tested those methods in total isolation I think you will come up with something similar. 如果您完全隔离地测试了这些方法,我想您会想到类似的东西。

My best guess is something else was effecting your results and that your assessment that the Original method was better is false. 我最好的猜测是其他因素正在影响您的结果,并且您认为原始方法更好的评估是错误的。

Your Revised Function 您修改的功能

Function GetPhoneNumberRegex(strPhoneNum As String)
    Dim strNewPhone As String = ""
    strNewPhone = Regex.Replace(strPhoneNum, "\D", "")
    If strNewPhone <> "" And strNewPhone.Substring(0, 1) = "1" Then
        strNewPhone = Mid(strNewPhone, 2)
    End If

    If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
        Return strNewPhone
    End If

    Return ""
End Function

My Function 我的功能

Function GetPhoneNumberMine(strPhoneNum As String)
    Dim strNewPhone As String = Regex.Replace(strPhoneNum, "\D", "")
    If (strNewPhone.Length >= 7 And strNewPhone(0) = "1") Then
        strNewPhone = strNewPhone.Remove(0, 1)
    End If

    Return If(strNewPhone.Length = 7 OrElse strNewPhone.Length = 10, strNewPhone, "")
End Function

Doing repeating things like this will slow you down, if its hitting this condition. 如果遇到这种情况,重复这样的事情会使您减速。

 If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
     Return strNewPhone
 End If

Instead, do this... 相反,请执行此操作...

     Dim value = Len(strNewPhone)
     If value = 7 OrElse value = 10 Then
         Return strNewPhone
     End If

But you should still be measuring the individual pieces (conditions/statements) to determine which of them is slowing you down, but only if it really matters. 但是,您仍然应该衡量各个部分(条件/陈述),以确定其中哪一个会拖慢您的速度,但前提是确实如此。

I don't know if you're seeing a real problem, but the code you show might very well be slower, because the regex is being freshly compiled each time you use it. 我不知道您是否遇到了真正的问题,但是显示的代码可能会变慢,因为每次使用regex都会对其进行重新编译。

See if this is any better: 看看这是否更好:

Regex rx = new Regex("\D") ' do this once, use it each time

Reference at MSDN MSDN上的参考

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM