简体   繁体   中英

Exclude results from Linq query excluding everything when exclude list is empty

I have the following code:

        public IList<Tweet> Match(IEnumerable<Tweet> tweetStream, IList<string> match, IList<string> exclude)
    {
        var tweets = from f in tweetStream
                     from m in match
                     where f.Text.ToLowerInvariant().Contains(m) 
                     select f;

        var final = from f in tweets
                    from e in exclude
                    where !f.Text.ToLowerInvariant().Contains(e.ToLowerInvariant())
                    select f;

        return final.Distinct().ToList<Tweet>();
    }

I've been building the tests up which haven't included the final resultset and been matching happily now I've added the exclude if the IList<string>exclude is empty all items are removed.

So this test passes as it should:

        [TestMethod]
    public void Should_exclude_items_from_exclude_list()
    {
        IEnumerable<Tweet> twitterStream = new List<Tweet>
                                               {
                                                   new Tweet("I have a Mazda car"),
                                                   new Tweet("I have a ford"),
                                                   new Tweet("Mazda Rules"),
                                                   new Tweet("My Ford car is great"),
                                                   new Tweet("My renault is brill"),
                                                   new Tweet("Mazda cars are great")
                                               };
        IList<string> matches = new List<string>{"mazda","car"};
        IList<string> exclude = new List<string>{"ford"};

        Matcher target = new Matcher();
        IList<Tweet> actual = target.Match(twitterStream, matches, exclude);

        Assert.AreEqual(3, actual.Count);            
    }

but this test now fails:

        [TestMethod]
    public void Should_match_items_either_mazda_or_car_but_no_duplicates()
    {
        IEnumerable<Tweet> twitterStream = new List<Tweet>
                                               {
                                                   new Tweet("I have a Mazda car"),
                                                   new Tweet("I have a ford"),
                                                   new Tweet("Mazda Rules"),
                                                   new Tweet("My Ford car is great"),
                                                   new Tweet("My renault is brill"),
                                                   new Tweet("Mazda cars are great")
                                               };
        IList<string> matches = new List<string>{"mazda","car"};
        IList<string> exclude = new List<string>();

        Matcher target = new Matcher();
        IList<Tweet> actual = target.Match(twitterStream, matches, exclude);

        Assert.AreEqual(4, actual.Count);
    }

I know I'm missing something really simple but after staring at the code for an hour its not coming to me.

Well, I know why it's failing: it's this clause:

from e in exclude

That's going to be an empty collection, so there are no entries to even hit the where clause.

Here's an alternative approach:

var final = from f in tweets
            let lower = f.Text.ToLowerInvariant()
            where !exclude.Any(e => lower.Contains(e.ToLowerInvariant())
            select f;

Although I considered msarchet's approach as well, the nice thing about this one is that it only ends up evaluating tweetStream once - so even if that reads from the network or does something else painful, you don't need to worry. Where possible (and convenient) I try to avoid evaluating LINQ streams more than once.

Of course, you can make the whole thing one query very easily:

var tweets = from f in tweetStream
             let lower = f.Text.ToLowerInvariant()
             where match.Any(m => lower.Contains(m.ToLowerInvariant())
             where !exclude.Any(e => lower.Contains(e.ToLowerInvariant())
             select f;

I'd consider that even cleaner, to be honest :)

So what is happening is this:

var final = from f in tweets
            from e in exclude
            where !f.Text.ToLowerInvariant().Contains(e.ToLowerInvariant())
            select f;

Since the second from is empty, If I am correct the rest of the the statement is not evaluated, so your select is never happening.

Try doing this like this instead

var excludeTheseTweet = from f in tweets
                        from e in exclude
                        where f.Text.ToLowerInvariant().Contains(e.ToLowerInvariant())
                        select f;

return tweets.Except(excludeTheseTweets).Distinct().ToList<Tweet>();

So that will get a list of tweets to exculde (so if there is nothing to exclude it won't get anything) and then it will remove those items form the original list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM