简体   繁体   English

如何将字符串拆分为有效的方式C#

[英]How to split a string into efficient way c#

I have a string like this: 我有一个像这样的字符串:

 -82.9494547,36.2913021,0
 -83.0784938,36.2347521,0
 -82.9537782,36.079235,0

I need to have output like this: 我需要这样的输出:

 -82.9494547 36.2913021, -83.0784938 36.2347521, -82.9537782,36.079235

I have tried this following to code to achieve the desired output: 我已经尝试过以下代码来实现所需的输出:

string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.None);

        for (int i = 0; i < coordinatesVal.Length - 1; i++)
        {
            coordinatesVal[i] = coordinatesVal[i].Trim();
            coordinatesVal[i] = coordinatesVal[i].Replace(',', ' ');

            numbers.Append(coordinatesVal[i]);

            if (i != coordinatesVal.Length - 1)
            {
                coordinatesVal.Append(", ");
            }

        } 

But this process does not seem to me the professional solution. 但是在我看来,这个过程并不是专业的解决方案。 Can anyone please suggest more efficient way of doing this? 任何人都可以建议更有效的方法吗?

Your code is okay. 您的代码还可以。 You could dismiss temporary results and chain method calls 您可以关闭临时结果和链式方法调用

var numbers = new StringBuilder();
string[] coordinatesVal = coordinateTxt
    .Trim()
    .Split(new string[] { ",0" }, StringSplitOptions.None);
for (int i = 0; i < coordinatesVal.Length - 1; i++) {
    numbers
        .Append(coordinatesVal[i].Trim().Replace(',', ' '))
        .Append(", ");
}
numbers.Length -= 2;

Note that the last statement assumes that there is at least one coordinate pair available. 请注意,最后一条语句假定至少有一个坐标对可用。 If the coordinates can be empty, you would have to enclose the loop and this last statement in if (coordinatesVal.Length > 0 ) { ... } . 如果坐标可以为空,则必须将循环和最后一条语句括在if (coordinatesVal.Length > 0 ) { ... } This is still more efficient than having an if inside the loop. 这比在循环中包含if更为有效。

You ask about efficiency, but you don't specify whether you mean code efficiency (execution speed) or programmer efficiency (how much time you have to spend on it). 您询问效率,但没有指定是指代码效率(执行速度)还是程序员效率(您需要花多少时间)。 One key part of professional programming is to judge which one of these is more important in any given situation. 专业编程的一个关键部分是判断在任何给定情况下哪一个更为重要。

The other answers do a good job of covering programmer efficiency, so I'm taking a stab at code efficiency. 其他答案很好地解决了程序员的效率问题,因此我在尝试提高代码效率。 I'm doing this at home for fun, but for professional work I would need a good reason before putting in the effort to even spend time comparing the speeds of the methods given in the other answers, let alone try to improve on them. 我在家里这样做很有趣,但是对于专业工作,在投入时间甚至比较其他答案中给出的方法的速度之前,我需要一个很好的理由,更不用说对其进行改进了。

Having said that, waiting around for the program to finish doing the conversion of millions of coordinate pairs would give me such a reason. 话虽如此,等待程序完成数百万个坐标对的转换会给我一个这样的理由。

One of the speed pitfalls of C# string handling is the way String.Replace() and String.Trim() return a whole new copy of the string. C#字符串处理的速成陷阱之一是String.Replace()String.Trim()返回字符串的全新副本的方式。 This involves allocating memory, copying the characters, and eventually cleaning up the garbage generated. 这涉及分配内存,复制字符并最终清除生成的垃圾。 Do that a few million times, and it starts to add up. 这样做几百万次,它开始累加起来。 With that in mind, I attempted to avoid as many allocations and copies as possible. 考虑到这一点,我试图避免尽可能多的分配和副本。

    enum CurrentField
    {
        FirstNum,
        SecondNum,
        UnwantedZero
    };

    static string ConvertStateMachine(string input)
    {
        // Pre-allocate enough space in the string builder.
        var numbers = new StringBuilder(input.Length);

        var state = CurrentField.FirstNum;
        int i = 0;
        while (i < input.Length)
        {
            char c = input[i++];

            switch (state)
            {
                // Copying the first number to the output, next will be another number
                case CurrentField.FirstNum:
                    if (c == ',')
                    {
                        // Separate the two numbers by space instead of comma, then move on
                        numbers.Append(' ');
                        state = CurrentField.SecondNum;
                    }
                    else if (!(c == ' ' || c == '\n'))
                    {
                        // Ignore whitespace, output anything else
                        numbers.Append(c);
                    }
                    break;

                // Copying the second number to the output, next will be the ,0\n that we don't need
                case CurrentField.SecondNum:
                    if (c == ',')
                    {
                        numbers.Append(", ");
                        state = CurrentField.UnwantedZero;
                    }
                    else if (!(c == ' ' || c == '\n'))
                    {
                        // Ignore whitespace, output anything else
                        numbers.Append(c);
                    }
                    break;
                case CurrentField.UnwantedZero:
                    // Output nothing, just track when the line is finished and we start all over again.
                    if (c == '\n')
                    {
                        state = CurrentField.FirstNum;
                    }
                    break;
            }
        }
        return numbers.ToString();
    }

This uses a state machine to treat incoming characters differently depending on whether they are part of the first number, second number, or the rest of the line, and output characters accordingly. 这使用状态机根据传入字符是第一个数字,第二个数字还是行的其余部分来区别对待它们,并相应地输出字符。 Each character is only copied once into the output, then I believe once more when the output is converted to a string at the end. 每个字符仅复制一次到输出中,然后我相信在输出最后转换为字符串时再复制一次。 This second conversion could probably be avoided by using a char[] for the output. 通过使用char[]作为输出,可以避免第二次转换。

The bottleneck in this code seems to be the number of calls to StringBuilder.Append() . 此代码中的瓶颈似乎是对StringBuilder.Append()的调用次数。 If more speed were required, I would first attempt to keep track of how many characters were to be copied directly into the output, then use .Append(string value, int startIndex, int count) to send an entire number across in one call. 如果需要更高的速度,我将首先尝试跟踪要直接复制到输出中的字符数,然后使用.Append(string value, int startIndex, int count)在一个调用中发送一个整数。

I put a few example solutions into a test harness, and ran them on a string containing 300,000 coordinate-pair lines, averaged over 50 runs. 我将一些示例解决方案放入测试工具,并在包含300,000条坐标对线的字符串上运行它们,平均运行50多次。 The results on my PC were: 我的电脑上的结果是:

String Split, Replace each line (see Olivier's answer, though I pre-allocated the space in the StringBuilder):
    6542 ms / 13493147 ticks, 130.84ms / 269862.9 ticks per conversion
Replace & Trim entire string (see Heriberto's second version):
    3352 ms / 6914604 ticks, 67.04 ms / 138292.1 ticks per conversion
    - Note: Original test was done with 900000 coord pairs, but this entire-string version suffered an out of memory exception so I had to rein it in a bit.
Split and Join (see Łukasz's answer):
    8780 ms / 18110672 ticks, 175.6 ms / 362213.4 ticks per conversion
Character state machine (see above):
    1685 ms / 3475506 ticks, 33.7 ms / 69510.12 ticks per conversion

So, the question of which version is most efficient comes down to: what are your requirements? 因此,哪个版本最有效的问题归结为:您的要求是什么?

Your solution is fine. 您的解决方案很好。 Maybe you could write it a bit more elegant like this: 也许您可以像这样写得更优雅一些:

string[] coordinatesVal = coordinateTxt.Trim().Split(new string[] { ",0" }, 
StringSplitOptions.RemoveEmptyEntries);
string result = string.Empty;
foreach (string line in coordinatesVal)
{
    string[] numbers = line.Trim().Split(',');
    result += numbers[0] + " " + numbers[1] + ", ";
}
result = result.Remove(result.Count()-2, 2);

Note the StringSplitOptions.RemoveEmptyEntries parameter of Split method so you don't have to deal with empty lines into foreach block. 请注意Split方法的StringSplitOptions.RemoveEmptyEntries参数,因此您不必在foreach块中处理空行。

Or you can do extremely short one-liner. 或者,您可以做一个非常短的单线。 Harder to debug, but in simple cases does the work. 难以调试,但在简单情况下即可完成工作。

string result =
  string.Join(", ",
    coordinateTxt.Trim().Split(new string[] { ",0" }, StringSplitOptions.RemoveEmptyEntries).
      Select(i => i.Replace(",", " ")));

heres another way without defining your own loops and replace methods, or using LINQ. 这是无需定义自己的循环和替换方法或使用LINQ的另一种方法。

 string coordinateTxt = @" -82.9494547,36.2913021,0
 -83.0784938,36.2347521,0
 -82.9537782,36.079235,0";

            string[] coordinatesVal = coordinateTxt.Replace(",", "*").Trim().Split(new string[] { "*0", Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
            string result = string.Join(",", coordinatesVal).Replace("*", " ");
            Console.WriteLine(result);

or even 甚至

            string coordinateTxt = @" -82.9494540,36.2913021,0
-83.0784938,36.2347521,0
-82.9537782,36.079235,0";

            string result = coordinateTxt.Replace(Environment.NewLine, "").Replace($",", " ").Replace(" 0", ", ").Trim(new char[]{ ',',' ' });
            Console.WriteLine(result);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM