简体   繁体   中英

Skip first and last in IEnumerable, deferring execution

I have this huge json file neatly formated starting with the characters "[\\r\\n" and ending with "]". I have this piece of code:

foreach (var line in File.ReadLines(@"d:\wikipedia\wikipedia.json").Skip(1))
{
  if (line[0] == ']') break;
  // Do stuff
}

I'm wondering, what would be best performance-wise, what machine code would be the most optimal in regards to how many clock cycles and memory is consumed if I were to compare the above code to one where I have replaced "break" with "continue", or would both of those pieces of code compile to the same MSIL and machine code? If you know the answer, please explain exactly how you reached your conclusion? I'd really like to know.

EDIT: Before you close this as nonsensical, consider that this code is equivalent to the above code and consider that the c# compiler optimizes when the code path is flat and does not fork in a lot of ways, would all of the following examples generate the same amount of work for the CPU?

IEnumerable<char> text = new[] {'[', 'a', 'b', 'c', ']'};
foreach (var c in text.Skip(1))
{
    if (c == ']') break;
    // Do stuff
}
foreach (var c in text.Skip(1))
{
    if (c == ']') continue;
    // Do stuff
}
foreach (var c in text.Skip(1))
{
    if (c != ']')
    {
        // Do stuff                    
    }
}
foreach (var c in text.Skip(1))
{
    if (c != ']')
    {
        // Do stuff                    
    }
}
foreach (var c in text.Skip(1))
{
    if (c != ']')
    {
        // Do stuff                    
    }
    else
    {
        break;
    }
}

EDIT2: Here's another way of putting it: what's the prettiest way to skip the first and last item in an IEnumerable while still deferring the executing until //Do stuff?

Q: Different MSIL for break or continue in loop?

Yes, that's because it works like this:

foreach (var item in foo)
{
    // more code...

    if (...) { continue; } // jump to #1
    if (...) { break; } // jump to #2

    // more code...

    // #1 -- just before the '}'
}

// #2 -- after the exit of the loop.

Q: What will give you the most performance?

Branches are branches for the compiler. If you have a goto , a continue or a break , it will eventually be compiled as a branch (opcode br ), which will be analyzes as such. In other words: it doesn't make a difference.

What does make a difference is having predictable patterns of both data and code flow in the code. Branching breaks code flow, so if you want performance, you should avoid irregular branches.

In other words, prefer:

for (int i=0; i<10 && someCondition; ++i)

to:

for (int i=0; i<10; ++i) 
{
    // some code
    if (someCondition) { ... } 
    // some code
}

As always with performance, the best thing to do is to run benchmarks . There's no surrogate.

Q: What will give you the most performance? (#2)

You're doing a lot with IEnumerable's. If you want raw performance and have the option, it's best to use an array or a string . There's no better alternative in terms of raw performance for sequential access of elements .

If an array isn't an option (for example because it doesn't match the access pattern), it's best to use a data structure that best suits the access pattern. Learn about the characteristics of hash tables (Dictionary), red black trees (SortedDictionary) and how List works. Knowledge about how stuff really works is the thing you need. If unsure, test, test and test again.

Q: What will give you the most performance? (#3)

I'd also try JSON libraries if your intent is to parse that. These people probably already invented the wheel for you - if not, it'll give you a baseline "to beat".

Q: [...] what's the prettiest way to skip the first and last item [...]

If the underlying data structure is a string , List or array , I'd simply do this:

for (int i=1; i<str.Length-1; ++i)
{ ... }

To be frank, other data structures don't really make sense here IMO. That said, people somethings like to put Linq code everywhere, so...

Using an enumerator

You can easily make a method that returns all but the first and last element. In my book, enumerators always are accessed in code through things like foreach to ensure that IDisposable is called correctly.

public static IEnumerable<T> GetAllButFirstAndLast<T>(IEnumerable<T> myEnum)
{
    T jtem = default(T);
    bool first = true;
    foreach (T item in myEnum.Skip(1)) 
    { 
        if (first) { first = false; } else { yield return jtem; }  
        jtem = item;
    }
}

Note that this has little to do with "getting the best performance out of your code". One look at the IL tells you all you need to know.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM