简体   繁体   English

使用LINQ,如何为与模式“ q1,q2”等匹配的值过滤字符串列表?

[英]Using LINQ how do you filter a list of strings for values that match the pattern “q1, q2” etc.?

This should be a simple one but I'm looking for the best answer with at least some regard for performance as well as elegance. 这应该是一个简单的方法,但我正在寻找最佳答案,至少要兼顾性能和优雅。

I have a list of strings and some of the values are of the form q1, q2, q3 etc. 我有一个字符串列表, 某些值的格式为q1,q2,q3等。

I'd like to select these. 我想选择这些。

What would be the best method of doing this? 这样做的最佳方法是什么?

The best answer is to use either Regex or Jay Bazuzi's int.TryParse suggestion . 最好的答案是使用Regex或Jay Bazuzi的int.TryParse建议

As a quick example try this in LINQPad : 作为快速示例,请在LINQPad中尝试以下操作:

void Main()
{
    int n = 100;

    string[] a = {"q2", "q3", "b"};
    a = a.Concat(Enumerable.Repeat(0,n).Select(i => "qasd")).ToArray(); /* Random data */

    /* Regex Method */
    System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex("^q[0-9]+$");
    List<string> regMethod = a.Where(c => r.IsMatch(c)).ToList().Dump("Result");

    /* IsInteger Method */
    List<string> intMethod = a.Where(c => c.StartsWith("q") && IsInteger(c.Substring(1))).ToList().Dump("Result");

    /* int.TryParse Method suggest by Jay Bazuzi */
    int e = 0;
    List<string> parseMethod = a.Where(c => c.StartsWith("q") && int.TryParse(c.Substring(1), out e)).ToList().Dump("Result");
}

public static bool IsInteger(string theValue)
{
   try
   {
       Convert.ToInt32(theValue);
       return true;
   } 
   catch 
   {
       return false;
   }
}

Try commenting one of the two methods out at a time and try out the performance for different values of n. 尝试一次注释掉两种方法之一,并尝试不同n值的性能。

My experience (on my Core 2 Duo Laptop) seems to suggest: 我的经验(在我的Core 2 Duo笔记本电脑上)似乎表明:

n = 100. Regex takes about 0.003 seconds, IsInteger takes about 0.01 seconds
n = 1,000. Regex takes about 0.004 seconds, IsInteger takes about 0.07 seconds
n = 10,000. Regex takes about 0.007 seconds, IsInteger takes about 0.67 seconds
n = 100,000. Regex takes about 0.046 seconds, IsInteger takes about 6.9 seconds
n = 1,000,000. Regex takes about 0.42 seconds, IsInteger takes about 1 minute and 6 seconds

ParseMethod has the same performance as Regex (slightly faster if done inline as in the code sample, about the same if done in a separate method, ie replacing the IsInteger method body). ParseMethod具有与Regex相同的性能(如果以内联方式完成,与代码示例中的速度略快,如果以单独的方法(即,替换IsInteger方法主体)完成,则性能几乎相同)。

NB: The cost of creating the strings is not accounted for (insert a date diff if you like), but the cost is the same for both methods 注意:创建字符串的成本未考虑在内(如果需要,可以插入日期差异),但是两种方法的成本相同

These numbers are much closer if the majority of the keys do not being with 'q' (IsInteger is never called), but the Regex is as good or better even when this is the case 如果大多数键都不带有'q'(从不调用IsInteger),则这些数字更接近,但是即使在这种情况下 ,Regex也一样好或更好。

Ie (for filler string of "asdasd" rather than "qasd"): 即(对于“ asdasd”而不是“ qasd”的填充字符串):

n = 100. Regex takes about 0.003 seconds, IsInteger takes about 0.003 seconds
n = 1,000. Regex takes about 0.004 seconds, IsInteger takes about 0.004 seconds
n = 10,000. Regex takes about 0.005 seconds, IsInteger takes about 0.005 seconds
n = 100,000. Regex takes about 0.023 seconds, IsInteger takes about 0.025 seconds
n = 1,000,000. Regex takes about 0.21 seconds, IsInteger takes about 0.22 seconds

Again ParseMethod has the same performance as Regex. 同样,ParseMethod具有与Regex相同的性能。

Conclusion: Use either Regex or the TryParse, it will be much faster in the worst case and as fast otherwise 结论:使用Regex或TryParse,在最坏的情况下它将更快,否则将更快

However, are there better/quicker ways of selecting the int values out of a collection of strings? 但是,是否有更好/更快捷的方法从字符串集合中选择int值? Perhaps a generic filter that is somehow compiled faster? 也许以某种方式编译速度更快的通用过滤器?

Seems like you're trying to microoptimize, which means you'll spend a lot of effort to make your program run the same speed. 似乎您正在尝试进行微优化,这意味着您将花费很多精力来使程序以相同的速度运行。 Focus on clarity of code first, and then optimize what is actually slow. 首先关注代码的清晰度,然后优化实际速度较慢的代码。

Assuming you have profiled and found this to be your application's bottleneck in real-world scenarios: 假设您已概要分析并发现这是您在实际场景中应用程序的瓶颈:

It is inelegant (and often a performance drag) to use exceptions in non-exceptional cases. 在非例外情况下使用例外情况很少(通常会降低性能)。 See http://www.developerfusion.com/code/4650/validating-an-integer/ , for example. 例如,请参见http://www.developerfusion.com/code/4650/validating-an-integer/

Depending on the constraints of your situation, you're probably better off changing your IsInteger() to do one of: 根据情况的限制,最好将IsInteger()更改为以下一项:

bool int.TryParse(string s, out int result);

(See http://msdn.microsoft.com/en-us/library/system.int32.tryparse.aspx ) (请参阅http://msdn.microsoft.com/zh-cn/library/system.int32.tryparse.aspx

or: 要么:

Microsoft.VisualBasic.Information.IsNumeric(object expression)

(See http://www.hanselman.com/blog/ExploringIsNumericForC.aspx ) (请参阅http://www.hanselman.com/blog/ExploringIsNumericForC.aspx

or 要么

x >= '0' && x < '9'

And then: 接着:

.Where(c => c[0] == 'q' && IsInteger(c[1]))

The bottleneck you have with IsInteger is probably because of the try-catch. IsInteger的瓶颈可能是由于try-catch。

I tried to replace IsInteger with TryParse, and I get the following results (with n=1,000,000): 我尝试用TryParse替换IsInteger,并且得到以下结果(n = 1,000,000):

Regex method: 540 ms 正则表达式方法:540 ms

TryParse method: 537 ms TryParse方法:537毫秒

I used the following code for the second method: 我将以下代码用于第二种方法:

Func<string, bool> lambda = (string c) => { Int32 temp; 
                                    return c.StartsWith("q") 
                                    && int.TryParse(c.Substring(1),out temp); };
List<string> intMethod = a.Where(lambda).ToList();

Moral of the story is... 这个故事的寓意是...

Although I usually prefer to use Regex, in this simple case where the string manipulations are simple the TryParse solution is perfectly acceptable. 尽管我通常更喜欢使用Regex,但是在这种简单的字符串操作简单的情况下,TryParse解决方案是完全可以接受的。 And performance-wise, it doesn't really matter which method you use, but don't use exception handling to check if some string value is an int! 在性能方面,使用哪种方法并不重要,但不要使用异常处理来检查某个字符串值是否为int!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM