简体   繁体   中英

What is the Big O of linq .where?

I am doing some comparisons about where to filter out items from a list. I am unsure of doing it directly which would be O(n), or using .Where(). I made a simple example to test .Where() on a simple data set. There are n=100 items, and when I run the debugger on the line in the function BigO() it goes exactly 100 times making me think that .Where() is also O(n). What I couldn't figure out was where the data was being stored during the operation and I wasn't sure if that was adding any increased complexity.

Am I missing something, or is .Where() O(n)?

public class ListerFactory
{

 public class Lister
 {
  bool includeItem { get; set; }
 }

 List<Lister> someList { get; set; }

 public ListerFactory()
 {
  someList = new List<Lister>();
  BuildLister();
 }    

 public void BuildLister()
 {
  for(int i = 0; i < 100; i++)
  {
   var inc = new Lister();
   inc.includeItem = i % 2;
   someList.Add(inc);
  }
  BigO();
 }

 public void BigO()
 {
  someList = someList.Where(thisList => thisList.includeItem == true).ToList();
 }
}

Where() is O(1); it doesn't actually do any work.

Looping through the collection returned by Where() is O(n). ..

The O(n) that you're seeing is the result of ToList() , which is O(n).
If you pass a Where() query to an O(n 2 ) algorithm, you will see the callback execute n 2 times. (assuming the algorithm doesn't cache anywhere)

This is called deferred execution.

This is true about most if not all LINQ providers; it wouldn't make sense for a LINQ provider to eagerly execute all calls.


In the case of LINQ to objects, this assumes that the source collection's enumerator is O(n).
If you're using some strange collection which iterates in worse than O(n) (in other words, if its MoveNext() is worse than O(1)), Where() will be bounded by that.

To be more precise, the time complexity of enumerating a Where() query is the same as the time complexity of the original enumeration.

Similarly, I'm assuming that the callback is O(1).
If it isn't, you'll need to multiply the complexity of the callback by the complexity of the original enumeration.

Depends on the source of the collection of course.

I disagree with @SLaks that the algorithm is O(1) because a query to Where() will keep searching for a candidate that matches the condition. In that sense it would be worst case O(n) with n the amount of work to yield the entire collection before the Where clause.

However he has a point it depends on the algorithm that yields the collection (for instance if it is a list that is already been build yielding the list is O(n) with n the number of items in the collection. Furthermore the algorithm that looks if there is a match is not necessarily O(1) . If the yield algorithm is O(n) and the match algorithm O(m) the time complexity is O(n*m) .

Take for instance a collection of integers:

int[] test = new int[] {1,2,3,4,5,6,7,8,9,10,7,5,0,1,5,6};

if you want to return all the integers who occur at least two times you could do this with a Where() clause:

test.Where(x => test.Count(y => x == y) >= 2);

The algorithm would be in O(n^2)

Secondly you can also build up the collection with a lazy setting:

public IEnumerable<int> GenerateCollection () {
    //some very complex calculation, here replaced by a simple for loop
    for(int i = 0; i < 150; i++) {
        yield return i;
    }
}

Your algorithm however first generates the list. So the timecomplexity is O(n) .

Notice however if you iterate the entire collection after the where the timecomplexity is still O(n*m) and not O(n*n*m) . That's because once a candidate has been matched, it will not be reconsidered.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM