简体   繁体   中英

Why does my Linq Where clause produce more results instead of less?

I just had the weirdest debug experience in a very long time. It's a bit embarassing to admit, but it lead me to be believe that my Linq query produces MORE results when adding an additional Where clause.

I know it's not possible, so I've refactored my offending function plus the unit test belonging to it into this:

[Test]
public void LoadUserBySearchString()
{
    //Setup
    var AllUsers = new List<User>
                       {
                           new User
                               {
                                   FirstName = "Luke",
                                   LastName = "Skywalker",
                                   Email = "luke@jedinet.org"
                               },
                           new User
                               {
                                   FirstName = "Leia",
                                   LastName = "Skywalker",
                                   Email = "faeryprincess@winxmail.com"
                               }
                       };


    //Execution
    List<User> SearchResults = LoadUserBySearchString("princess", AllUsers.AsQueryable());
    List<User> SearchResults2 = LoadUserBySearchString("princess Skywalker", AllUsers.AsQueryable());

    //Assertion
    Assert.AreEqual(1, SearchResults.Count); //test passed!
    Assert.AreEqual(1, SearchResults2.Count); //test failed! got 2 instead of 1 User???
}


//search CustID, fname, lname, email for substring(s)
public List<User> LoadUserBySearchString(string SearchString, IQueryable<User> AllUsers)
{
    IQueryable<User> Result = AllUsers;
    //split into substrings and apply each substring as additional search criterium
    foreach (string SubString in Regex.Split(SearchString, " "))
    {            
        int SubStringAsInteger = -1;
        if (SubString.IsInteger())
        {
            SubStringAsInteger = Convert.ToInt32(SubString);
        }

        if (SubString != null && SubString.Length > 0)
        {
            Result = Result.Where(c => (c.FirstName.Contains(SubString)
                                        || c.LastName.Contains(SubString)
                                        || c.Email.Contains(SubString)
                                        || (c.ID == SubStringAsInteger)
                                       ));
        }
    }
    return Result.ToList();
}

I have debugged the LoadUserBySearchString function and asserted that the second call to the function actually produces a linq query with two where clauses instead of one. So it seems that the additional where clause is increasing the amount of results.

What's even more weird, the LoadUserBySearchString function works great when I test it by hand (with real users from the database). It only shows this weird behavior when running the unit test.

I guess I just need some sleep (or even an extended vacation). If anyone could please help me shed some light on this, I could go stop questioning my sanity and go back to work.

Thanks,

Adrian

Edit (to clarify on several responses I go so far) : I know it looks like it is the or clause, but unfortuantely it is not that simple. LoadUserBySearchString splits the search string into several strings and attaches a Where clause for each of them. "Skywalker" matches both luke and Leia, but "princess" only matches Leia.

This is the Linq query for the search string "princess":

+       Result  {System.Collections.Generic.List`1[TestProject.Models.User].Where(c => (((c.FirstName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString) || c.LastName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || c.Email.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || (c.ID = value(TestProject.Controllers.SearchController+<>c__DisplayClass3).SubStringAsInteger)))}  System.Linq.IQueryable<TestProject.Models.User> {System.Linq.EnumerableQuery<TestProject.Models.User>}

And this is the Linq clause for the search string "princess Skywalker"

+       Result  {System.Collections.Generic.List`1[TestProject.Models.User].Where(c => (((c.FirstName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString) || c.LastName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || c.Email.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || (c.ID = value(TestProject.Controllers.SearchController+<>c__DisplayClass3).SubStringAsInteger))).Where(c => (((c.FirstName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString) || c.LastName.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || c.Email.Contains(value(TestProject.Controllers.SearchController+<>c__DisplayClass1).SubString)) || (c.ID = value(TestProject.Controllers.SearchController+<>c__DisplayClass3).SubStringAsInteger)))}    System.Linq.IQueryable<TestProject.Models.User> {System.Linq.EnumerableQuery<TestProject.Models.User>}

Same as above, just with one additional where clause.

This is a nice little gotcha.

What is happening is that, because of anonymous methods, and deferred execution, you're actually not filtering on "princess". Instead, you're building a filter that will filter on the contents of the subString variable.

But, you then change this variable, and build another filter, which again uses the same variable.

Basically, this is what you will execute, in short form:

Where(...contains(SubString)).Where(...contains(SubString))

so, you're actually only filtering on the last word, which exists in both, simply because by the time these filters are actually applied, there is only one SubString value left, the last one.

If you change the code so that you capture the SubString variables inside the scope of the loop, it'll work:

if (SubString != null && SubString.Length > 0)
{
    String captured = SubString;
    Int32 capturedId = SubStringAsInteger;
    Result = Result.Where(c => (c.FirstName.Contains(captured)
                                || c.LastName.Contains(captured)
                                || c.Email.Contains(captured)
                                || (c.ID == capturedId)
                               ));
}

Your algorithm amounts to "select records which match any of the words in the search string".

This is because of deferred execution. The query is not actually performed until you call the .ToList(). If you move the .ToList() inside the loop, you'll get the behaviour you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM