[英]Speed of VB.net Regex?
Is the regex code in VB.net known to be slow? 是VB.net中的正则表达式代码慢吗?
I took over some code that was cleaning large amounts of text data. 我接管了一些正在清除大量文本数据的代码。 The code ran fairly slow, so I was looking for some ways to speed it up.
代码运行得很慢,所以我正在寻找一些加快速度的方法。 I found a couple functions that got run a lot that I thought might be part of the problem.
我发现运行了很多函数,我认为这可能是问题的一部分。
Here's the original code for cleaning a phone number: 这是清理电话号码的原始代码:
Dim strArray() As Char = strPhoneNum.ToCharArray
Dim strNewPhone As String = ""
Dim i As Integer
For i = 0 To strArray.Length - 1
If strArray.Length = 11 And strArray(0) = "1" And i = 0 Then
Continue For
End If
If IsNumeric(strArray(i)) Then
strNewPhone = strNewPhone & strArray(i)
End If
Next
If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
Return strNewPhone
End If
I rewrote the code to eliminate the array and looping using regex. 我重写了代码,以消除使用正则表达式的数组和循环。
Dim strNewPhone As String = ""
strNewPhone = Regex.Replace(strPhoneNum, "\D", "")
If strNewPhone = "" OrElse strNewPhone.Substring(0, 1) <> "1" Then
Return strNewPhone
Else
strNewPhone = Mid(strNewPhone, 2)
End If
If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
Return strNewPhone
End If
After running a couple tests, the new code is significantly slower than the old. 运行几次测试后,新代码比旧代码要慢得多。 Is regex in VB.net slow, did I add some other thing that is the issue, or is the original code just fine the way it was?
VB.net中的正则表达式是否运行缓慢,是否添加了其他一些问题,或者原始代码是否还算不错?
I conducted some tests with the Visual Studio Profiler and I did not get the same results you did. 我使用Visual Studio Profiler进行了一些测试,但没有得到与您相同的结果。 There was a logical error is your Regex function that caused the length check to be missed if the number didn't begin with
1
. 您的Regex函数存在一个逻辑错误,如果数字不是以
1
开头,则会导致长度检查丢失。 I corrected that in my tests. 我在测试中纠正了这个问题。
Results 结果
In general my method was always slightly faster. 总的来说,我的方法总是稍快一些。
My Conclusion 我的结论
In all tests the Original method was much slower. 在所有测试中,原始方法都慢得多。 Had it come out better in one test then I be able to explain our discrepancy.
如果它在一项测试中表现更好,那么我就能解释我们的差异。 Ff you tested those methods in total isolation I think you will come up with something similar.
如果您完全隔离地测试了这些方法,我想您会想到类似的东西。
My best guess is something else was effecting your results and that your assessment that the Original method was better is false. 我最好的猜测是其他因素正在影响您的结果,并且您认为原始方法更好的评估是错误的。
Your Revised Function 您修改的功能
Function GetPhoneNumberRegex(strPhoneNum As String)
Dim strNewPhone As String = ""
strNewPhone = Regex.Replace(strPhoneNum, "\D", "")
If strNewPhone <> "" And strNewPhone.Substring(0, 1) = "1" Then
strNewPhone = Mid(strNewPhone, 2)
End If
If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
Return strNewPhone
End If
Return ""
End Function
My Function 我的功能
Function GetPhoneNumberMine(strPhoneNum As String)
Dim strNewPhone As String = Regex.Replace(strPhoneNum, "\D", "")
If (strNewPhone.Length >= 7 And strNewPhone(0) = "1") Then
strNewPhone = strNewPhone.Remove(0, 1)
End If
Return If(strNewPhone.Length = 7 OrElse strNewPhone.Length = 10, strNewPhone, "")
End Function
Doing repeating things like this will slow you down, if its hitting this condition. 如果遇到这种情况,重复这样的事情会使您减速。
If Len(strNewPhone) = 7 Or Len(strNewPhone) = 10 Then
Return strNewPhone
End If
Instead, do this... 相反,请执行此操作...
Dim value = Len(strNewPhone)
If value = 7 OrElse value = 10 Then
Return strNewPhone
End If
But you should still be measuring the individual pieces (conditions/statements) to determine which of them is slowing you down, but only if it really matters. 但是,您仍然应该衡量各个部分(条件/陈述),以确定其中哪一个会拖慢您的速度,但前提是确实如此。
I don't know if you're seeing a real problem, but the code you show might very well be slower, because the regex is being freshly compiled each time you use it. 我不知道您是否遇到了真正的问题,但是显示的代码可能会变慢,因为每次使用regex都会对其进行重新编译。
See if this is any better: 看看这是否更好:
Regex rx = new Regex("\D") ' do this once, use it each time
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.