简体   繁体   中英

Most efficient way to parse a delimited string in C#

This has been asked a few different ways but I am debating on "my way" vs "your way" with another developer. Language is C#.

I want to parse a pipe delimited string where the first 2 characters of each chunk is my tag.

The rules. Not my rules but rules I have been given and must follow. I can't change the format of the string. This function will be called possibly many times so efficiency is key. I need to keep is simple. The input string and tag I am looking for may/will change during runtime.

Example input string: AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4 Example tag I may need value for: AB

I split string into an array based on delimiter and loop through the array each time the function is called. I then looked at the first 2 characters and return the value minus the first 2 characters.

The "other guys" way is to take the string and use a combination of IndexOf and SubString to find the starting point and ending point of the field I am looking for. Then using SubString again to pullout the value minus the first 2 characters. So he would say IndexOf("|AB") the find then next pipe in the string. This would be the start and end. Then SubString that out.

Now I should think that IndexOf and SubString would parse the string each time at a char by char level so this would be less efficient than using large chunks and reading the string minus the first 2 characters. Or is there another way the is better then what both of us has proposed?

The other guy's approach is going to be more efficient in time given that input string needs to be reevaluated each time. If the input string is long, it is also won't require the extra memory that splitting the string would.

If I'm trying to code a really tight loop I prefer to directly use array/string operators rather than LINQ to avoid that additional overhead:

string inputString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";

static string FindString(string tag)
{
    int startIndex;
    if (inputString.StartsWith(tag))
    {
        startIndex = tag.Length;
    }
    else
    {
        startIndex = inputString.IndexOf(string.Format("|{0}", tag));
        if (startIndex == -1)
            return string.Empty;

        startIndex += tag.Length + 1;
    }

    int endIndex = inputString.IndexOf('|', startIndex);
    if (endIndex == -1)
        endIndex = inputString.Length;

    return inputString.Substring(startIndex, endIndex - startIndex);
}

I've done a lot of parsing in C# and I would probably take the approach suggested by the "other guys" just because it would be a bit lighter on resources used and likely to be a little faster as well.

That said, as long as the data isn't too big, there's nothing wrong with the first approach and it will be much easier to program.

Something like this may work ok

string myString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";
string selector = "AB";

var results = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, ""));

Returns: list of the matches, in this case just one "VALUE2"

If you are just looking for the first or only match this will work.

 string result = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, "")).FirstOrDefault();
  • SubString does not parse the string.
  • IndexOf does parse the string.

My preference would be the Split method, primarily code coding efficiency:

string[] inputArr = input.Split("|".ToCharArray()).Select(s => s.Substring(3)).ToArray();

is pretty concise. How many LoC does the substring/indexof method take?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM