简体   繁体   中英

How to split a string into efficient way c#

I have a string like this:

 -82.9494547,36.2913021,0
 -83.0784938,36.2347521,0
 -82.9537782,36.079235,0

I need to have output like this:

 -82.9494547 36.2913021, -83.0784938 36.2347521, -82.9537782,36.079235

I have tried this following to code to achieve the desired output:

string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.None);

        for (int i = 0; i < coordinatesVal.Length - 1; i++)
        {
            coordinatesVal[i] = coordinatesVal[i].Trim();
            coordinatesVal[i] = coordinatesVal[i].Replace(',', ' ');

            numbers.Append(coordinatesVal[i]);

            if (i != coordinatesVal.Length - 1)
            {
                coordinatesVal.Append(", ");
            }

        } 

But this process does not seem to me the professional solution. Can anyone please suggest more efficient way of doing this?

Your code is okay. You could dismiss temporary results and chain method calls

var numbers = new StringBuilder();
string[] coordinatesVal = coordinateTxt
    .Trim()
    .Split(new string[] { ",0" }, StringSplitOptions.None);
for (int i = 0; i < coordinatesVal.Length - 1; i++) {
    numbers
        .Append(coordinatesVal[i].Trim().Replace(',', ' '))
        .Append(", ");
}
numbers.Length -= 2;

Note that the last statement assumes that there is at least one coordinate pair available. If the coordinates can be empty, you would have to enclose the loop and this last statement in if (coordinatesVal.Length > 0 ) { ... } . This is still more efficient than having an if inside the loop.

You ask about efficiency, but you don't specify whether you mean code efficiency (execution speed) or programmer efficiency (how much time you have to spend on it). One key part of professional programming is to judge which one of these is more important in any given situation.

The other answers do a good job of covering programmer efficiency, so I'm taking a stab at code efficiency. I'm doing this at home for fun, but for professional work I would need a good reason before putting in the effort to even spend time comparing the speeds of the methods given in the other answers, let alone try to improve on them.

Having said that, waiting around for the program to finish doing the conversion of millions of coordinate pairs would give me such a reason.

One of the speed pitfalls of C# string handling is the way String.Replace() and String.Trim() return a whole new copy of the string. This involves allocating memory, copying the characters, and eventually cleaning up the garbage generated. Do that a few million times, and it starts to add up. With that in mind, I attempted to avoid as many allocations and copies as possible.

    enum CurrentField
    {
        FirstNum,
        SecondNum,
        UnwantedZero
    };

    static string ConvertStateMachine(string input)
    {
        // Pre-allocate enough space in the string builder.
        var numbers = new StringBuilder(input.Length);

        var state = CurrentField.FirstNum;
        int i = 0;
        while (i < input.Length)
        {
            char c = input[i++];

            switch (state)
            {
                // Copying the first number to the output, next will be another number
                case CurrentField.FirstNum:
                    if (c == ',')
                    {
                        // Separate the two numbers by space instead of comma, then move on
                        numbers.Append(' ');
                        state = CurrentField.SecondNum;
                    }
                    else if (!(c == ' ' || c == '\n'))
                    {
                        // Ignore whitespace, output anything else
                        numbers.Append(c);
                    }
                    break;

                // Copying the second number to the output, next will be the ,0\n that we don't need
                case CurrentField.SecondNum:
                    if (c == ',')
                    {
                        numbers.Append(", ");
                        state = CurrentField.UnwantedZero;
                    }
                    else if (!(c == ' ' || c == '\n'))
                    {
                        // Ignore whitespace, output anything else
                        numbers.Append(c);
                    }
                    break;
                case CurrentField.UnwantedZero:
                    // Output nothing, just track when the line is finished and we start all over again.
                    if (c == '\n')
                    {
                        state = CurrentField.FirstNum;
                    }
                    break;
            }
        }
        return numbers.ToString();
    }

This uses a state machine to treat incoming characters differently depending on whether they are part of the first number, second number, or the rest of the line, and output characters accordingly. Each character is only copied once into the output, then I believe once more when the output is converted to a string at the end. This second conversion could probably be avoided by using a char[] for the output.

The bottleneck in this code seems to be the number of calls to StringBuilder.Append() . If more speed were required, I would first attempt to keep track of how many characters were to be copied directly into the output, then use .Append(string value, int startIndex, int count) to send an entire number across in one call.

I put a few example solutions into a test harness, and ran them on a string containing 300,000 coordinate-pair lines, averaged over 50 runs. The results on my PC were:

String Split, Replace each line (see Olivier's answer, though I pre-allocated the space in the StringBuilder):
    6542 ms / 13493147 ticks, 130.84ms / 269862.9 ticks per conversion
Replace & Trim entire string (see Heriberto's second version):
    3352 ms / 6914604 ticks, 67.04 ms / 138292.1 ticks per conversion
    - Note: Original test was done with 900000 coord pairs, but this entire-string version suffered an out of memory exception so I had to rein it in a bit.
Split and Join (see Łukasz's answer):
    8780 ms / 18110672 ticks, 175.6 ms / 362213.4 ticks per conversion
Character state machine (see above):
    1685 ms / 3475506 ticks, 33.7 ms / 69510.12 ticks per conversion

So, the question of which version is most efficient comes down to: what are your requirements?

Your solution is fine. Maybe you could write it a bit more elegant like this:

string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" }, 
StringSplitOptions.RemoveEmptyEntries);
string result = string.Empty;
foreach (string line in coordinatesVal)
{
    string[] numbers = line.Trim().Split(',');
    result += numbers[0] + " " + numbers[1] + ", ";
}
result = result.Remove(result.Count()-2, 2);

Note the StringSplitOptions.RemoveEmptyEntries parameter of Split method so you don't have to deal with empty lines into foreach block.

Or you can do extremely short one-liner. Harder to debug, but in simple cases does the work.

string result =
  string.Join(", ",
    coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.RemoveEmptyEntries).
      Select(i => i.Replace(",", " ")));

heres another way without defining your own loops and replace methods, or using LINQ.

 string coordinateTxt = @" -82.9494547,36.2913021,0
 -83.0784938,36.2347521,0
 -82.9537782,36.079235,0";

            string[] coordinatesVal = coordinateTxt.Replace(",", "*").Trim().Split(new string[] { "*0", Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
            string result = string.Join(",", coordinatesVal).Replace("*", " ");
            Console.WriteLine(result);

or even

            string coordinateTxt = @" -82.9494540,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0";

            string result = coordinateTxt.Replace(Environment.NewLine, "").Replace($",", " ").Replace(" 0", ", ").Trim(new char[]{ ',',' ' });
            Console.WriteLine(result);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM