What is the preferred (performant and readable) way of chaining IEnumerable<T> extension methods?

Question

If I'm trying to filter results at multiple levels of an IEnumerable<T> object graph, is there a preferred way of chaining extension methods to do this?

I'm open to any extension method and lambda usage, but I'd prefer not to use LINQ syntax to remain consistent with the rest of the codebase.

Is it better to push the filtering to the selector of the SelectMany() method or just to chain another Where() method? Or is there a better solution?

How would I go about identifying the best option? In this test case, everything is directly available in memory. Obviously both samples below are currently producing the same correct results; I'm just looking for a reason one or the other (or another option) would be preferred.

public class Test
{
    // I want the first chapter of a book that's exactly 42 pages, written by
    // an author whose name is Adams, from a library in London.
    public Chapter TestingIEnumerableTExtensionMethods()
    {
        List<Library> libraries = GetLibraries();

        Chapter chapter = libraries
            .Where(lib => lib.City == "London")
            .SelectMany(lib => lib.Books)
            .Where(b => b.Author == "Adams")
            .SelectMany(b => b.Chapters)
            .First(c => c.NumberOfPages == 42);

        Chapter chapter2 = libraries
            .Where(lib => lib.City == "London")
            .SelectMany(lib => lib.Books.Where(b => b.Author == "Adams"))
            .SelectMany(b => b.Chapters.Where(c => c.NumberOfPages == 42))
            .First();
    }

And here's the sample object graph:

public class Library
{
    public string Name { get; set; }
    public string City { get; set; }
    public List<Book> Books { get; set; }
}

public class Book
{
    public string Name { get; set; }
    public string Author { get; set; }
    public List<Chapter> Chapters { get; set; }
}

public class Chapter
{
    public string Name { get; set; }
    public int NumberOfPages { get; set; }
}

Answer 1

Which is best likely varies based on the LINQ implementation you're using. LinqToSql will behave differently from in-memory filtering. The order of the clauses should impact the performance depending on what data is used, since naive implementations will filter more records earlier in the sequence meaning less work for the later methods.

For your two examples, I would guess that the performance difference is negligible and would favor the first since it allows easier modification of each clause independent of the others.

As for determining the best option, it's the same as anything else: measure.

Answer 2

I'm guessing the first expression you have will be slightly but insignificantly faster. To really determine if one or the other is faster, you will need to time them, with a profiler or Stopwatch.

The readability doesn't seem to be strongly affected either way. I prefer the first approach, as it has less levels of nesting. It all depends on your personal preference.

Answer 3

It depends on how the underlying LINQ provider works. For LINQ to Objects, both in this case would require about the same amount of work, more or less. But that's the most straightforward (simplest) example, so beyond that it's hard to say.

Answer 4

This might give you a different angle, though it's more a matter of style...
I sometimes find myself doing something like this...

return libraries.Filter(
        l => l.City == "",
        l => l.Books,
        b => b.Author == "Adams",
        b => b.Chapters,
        c => c.NumberOfPages == 42
        );

...where you can guess what the extensiion is, something like...

public static IEnumerable<TC> Filter<TL, TB, TC>(this IEnumerable<TL> list,
    Func<TL, bool> whereLs,
    Func<TL, IEnumerable<TB>> selectBs,
    Func<TB, bool> whereBs,
    Func<TB, IEnumerable<TC>> selectCs,
    Func<TC, bool> whereCs
    )
{
    return list
        .Where(whereLs)
        .SelectMany(selectBs)
        .Where(whereBs)
        .SelectMany(selectCs)
        .Where(whereCs);
}

...or....

...    
{
    return list
        .Where(whereLs)
        .SelectMany(l => selectBs(l).Where(whereBs))
        .SelectMany(b => selectCs(b).Where(whereCs));
}

And combinations / options are many, depending on what you have, how you 'like having your code' (abstract it some more or 'capture', 'parametrize' eg PerCityAuthorPages(_city, _author, _numPages); etc.)

...basically, I dislike having all the 'Where', 'Select'-s etc. and to me is not that readable (either). While with the 'short form' it's quite clear which is which, where, select etc. and it's very much 'short-hand' and in much less chars.

Also, you can deffer the decision about Where/Select combinations for later (do one or the other based on the needs, provider)

And @Telastyn is quite right, LINQ providers, eg if you look at some implementation code,
with all the expressions reducing etc.
are pretty non-deterministic (ie from provider to provider) in a way they might end up mapping to eg SQL
though this should map the same in most I think.

What is the preferred (performant and readable) way of chaining IEnumerable<T> extension methods?

Question

4 answers

solution1
3 2012-04-09 20:07:50

solution2
2 2012-04-09 20:06:39

solution3
1 2012-04-09 20:06:23

solution4
0 2012-04-09 23:23:28

What is the preferred (performant and readable) way of chaining IEnumerable<T> extension methods?

Question

4 answers

solution1 3 2012-04-09 20:07:50

solution2 2 2012-04-09 20:06:39

solution3 1 2012-04-09 20:06:23

solution4 0 2012-04-09 23:23:28

solution1
3 2012-04-09 20:07:50

solution2
2 2012-04-09 20:06:39

solution3
1 2012-04-09 20:06:23

solution4
0 2012-04-09 23:23:28