简体   繁体   中英

Using LINQ how do you filter a list of strings for values that match the pattern “q1, q2” etc.?

This should be a simple one but I'm looking for the best answer with at least some regard for performance as well as elegance.

I have a list of strings and some of the values are of the form q1, q2, q3 etc.

I'd like to select these.

What would be the best method of doing this?

The best answer is to use either Regex or Jay Bazuzi's int.TryParse suggestion .

As a quick example try this in LINQPad :

void Main()
{
    int n = 100;

    string[] a = {"q2", "q3", "b"};
    a = a.Concat(Enumerable.Repeat(0,n).Select(i => "qasd")).ToArray(); /* Random data */

    /* Regex Method */
    System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex("^q[0-9]+$");
    List<string> regMethod = a.Where(c => r.IsMatch(c)).ToList().Dump("Result");

    /* IsInteger Method */
    List<string> intMethod = a.Where(c => c.StartsWith("q") && IsInteger(c.Substring(1))).ToList().Dump("Result");

    /* int.TryParse Method suggest by Jay Bazuzi */
    int e = 0;
    List<string> parseMethod = a.Where(c => c.StartsWith("q") && int.TryParse(c.Substring(1), out e)).ToList().Dump("Result");
}

public static bool IsInteger(string theValue)
{
   try
   {
       Convert.ToInt32(theValue);
       return true;
   } 
   catch 
   {
       return false;
   }
}

Try commenting one of the two methods out at a time and try out the performance for different values of n.

My experience (on my Core 2 Duo Laptop) seems to suggest:

n = 100. Regex takes about 0.003 seconds, IsInteger takes about 0.01 seconds
n = 1,000. Regex takes about 0.004 seconds, IsInteger takes about 0.07 seconds
n = 10,000. Regex takes about 0.007 seconds, IsInteger takes about 0.67 seconds
n = 100,000. Regex takes about 0.046 seconds, IsInteger takes about 6.9 seconds
n = 1,000,000. Regex takes about 0.42 seconds, IsInteger takes about 1 minute and 6 seconds

ParseMethod has the same performance as Regex (slightly faster if done inline as in the code sample, about the same if done in a separate method, ie replacing the IsInteger method body).

NB: The cost of creating the strings is not accounted for (insert a date diff if you like), but the cost is the same for both methods

These numbers are much closer if the majority of the keys do not being with 'q' (IsInteger is never called), but the Regex is as good or better even when this is the case

Ie (for filler string of "asdasd" rather than "qasd"):

n = 100. Regex takes about 0.003 seconds, IsInteger takes about 0.003 seconds
n = 1,000. Regex takes about 0.004 seconds, IsInteger takes about 0.004 seconds
n = 10,000. Regex takes about 0.005 seconds, IsInteger takes about 0.005 seconds
n = 100,000. Regex takes about 0.023 seconds, IsInteger takes about 0.025 seconds
n = 1,000,000. Regex takes about 0.21 seconds, IsInteger takes about 0.22 seconds

Again ParseMethod has the same performance as Regex.

Conclusion: Use either Regex or the TryParse, it will be much faster in the worst case and as fast otherwise

However, are there better/quicker ways of selecting the int values out of a collection of strings? Perhaps a generic filter that is somehow compiled faster?

Seems like you're trying to microoptimize, which means you'll spend a lot of effort to make your program run the same speed. Focus on clarity of code first, and then optimize what is actually slow.

Assuming you have profiled and found this to be your application's bottleneck in real-world scenarios:

It is inelegant (and often a performance drag) to use exceptions in non-exceptional cases. See http://www.developerfusion.com/code/4650/validating-an-integer/ , for example.

Depending on the constraints of your situation, you're probably better off changing your IsInteger() to do one of:

bool int.TryParse(string s, out int result);

(See http://msdn.microsoft.com/en-us/library/system.int32.tryparse.aspx )

or:

Microsoft.VisualBasic.Information.IsNumeric(object expression)

(See http://www.hanselman.com/blog/ExploringIsNumericForC.aspx )

or

x >= '0' && x < '9'

And then:

.Where(c => c[0] == 'q' && IsInteger(c[1]))

The bottleneck you have with IsInteger is probably because of the try-catch.

I tried to replace IsInteger with TryParse, and I get the following results (with n=1,000,000):

Regex method: 540 ms

TryParse method: 537 ms

I used the following code for the second method:

Func<string, bool> lambda = (string c) => { Int32 temp; 
                                    return c.StartsWith("q") 
                                    && int.TryParse(c.Substring(1),out temp); };
List<string> intMethod = a.Where(lambda).ToList();

Moral of the story is...

Although I usually prefer to use Regex, in this simple case where the string manipulations are simple the TryParse solution is perfectly acceptable. And performance-wise, it doesn't really matter which method you use, but don't use exception handling to check if some string value is an int!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM