简体   繁体   English

C#加速字符串?

[英]C# Speed up for Strings?

struct mydata
{
    public int id;
    public string data;
}

class Program
{
    static void Main(string[] args)
    {
        List<mydata> myc = new List<mydata>();

        Stopwatch stopwatch = new Stopwatch();

        stopwatch.Start();

        for (int i = 0; i < 1000000; i++)
        {
            mydata d = new mydata();

            d.id = i;
            d.data = string.Format("DataValue {0}",i);

            myc.Add(d);
        }

        stopwatch.Stop();
        Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
}

Whys is this code above so SLOW..? 这个代码上面的代码是如此慢......?
On an older laptop the times are: C# code above: 1500ms Similar code in Delphi: 450ms.... 在较旧的笔记本电脑上,时间是:C#代码高于:1500ms类似的代码在Delphi:450ms ....

I then changed the code to a KeyValue/Pair (see below): 然后我将代码更改为KeyValue / Pair(见下文):

Stopwatch stopwatch = new Stopwatch();

        stopwatch.Start();

        var list = new List<KeyValuePair<int , string>>();

        for (int i = 0; i < 1000000; i++)
        {
            list.Add(new KeyValuePair<int,string>(i, "DataValue" + i));
        }

        stopwatch.Stop();
        Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
        Console.ReadLine();

This improved the time to 1150ms.. 这改善了1150ms的时间..

If I remove the '+ i' the time is < 300ms 如果我删除'+ i',则时间<300ms

If I try and replace it with a StringBuilder, the timing is similar. 如果我尝试用StringBuilder替换它,时间是类似的。

        StringBuilder sb = new StringBuilder();
        Stopwatch stopWatch = new Stopwatch();
        stopWatch.Start();

        var list = new List<KeyValuePair<int, string>>();

        for (int i = 0; i < 1000000; i++)
        {
            sb.Append("DataValue");
            sb.Append(i);
            list.Add(new KeyValuePair<int, string>(i, sb.ToString()));
            sb.Clear();
        }

        stopWatch.Stop();
        Console.WriteLine("End: {0}", stopWatch.ElapsedMilliseconds);
        Console.ReadLine();

Is slightly better.. If you remove the sb.Append(i) its very fast.. 稍微好一点..如果你删除sb.Append(i)它非常快..

It would appear that any time you have to add an Int to a string/stringbuilder its VERY SLOW.. 看来,任何时候你必须将一个Int添加到一个字符串/ stringbuilder它非常慢。

Can I speed this up in any way ?? 我能以任何方式加快速度吗?

EDIT ** 编辑**

The code below is the quickest I can get after making suggestions: 以下代码是我提出建议后最快的代码:

using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Diagnostics; using System.Threading;

namespace ConsoleApplication1 { struct mydata { public int id; public string data; }

class Program
{
    static void Main(string[] args)
    {
        List<mydata> myc = new List<mydata>();

        Stopwatch stopwatch = new Stopwatch();

        stopwatch.Start();

        for (int i = 0; i < 1000000; i++)
        {
           mydata d = new mydata();
           d.id = i;
           d.data = "DataValue " + i.ToString();
           myc.Add(d);
        }

        stopwatch.Stop();
        Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds);
        Console.ReadLine();
    }

}

}

If I replace the line: 如果我更换线路:


  d.data = "DataValue " + i.ToString();
with: 有:
 d.data = "DataValue "; 

On my home machine this goes from 660ms -> 31ms.. 在我的家用机器上,这是从660ms - > 31ms ..

Yes.. its 630ms slower with the '+ i.ToString()' 是的..用'+ i.ToString()'慢了630ms

But still 2x faster than boxing/string.format etc etc.. 但仍然比拳击/ string.format等快2倍..

 Stopwatch stopwatch = new Stopwatch(); 

\n\n
  stopwatch.Start(); var list = new List<KeyValuePair<int, string>>(); for (int i = 0; i < 1000000; i++) { list.Add(new KeyValuePair<int, string>(i, "DataValue" +i.ToString())); } stopwatch.Stop(); Console.WriteLine("End: {0}", stopwatch.ElapsedMilliseconds); Console.ReadLine(); 
\n\n

is 612ms.. (no difference in speed if List>(1000000); is pre-initialised). 是612ms ..(如果列表>(1000000),则没有速度差异;是预先初始化的)。

The problem with your first two examples is that the integer must first be boxed and then converted to a string. 前两个示例的问题是必须首先将整数装箱,然后转换为字符串。 The boxing causes the code to be slower. 拳击会导致代码变慢。

For example, in this line: 例如,在这一行:

d.data = string.Format("DataValue {0}", i);

the second parameter to string.Format is object , which causes boxing of i . string.Format的第二个参数是object ,它导致i装箱。 See the intermediate language code for confirmation of this: 请参阅中间语言代码以确认:

...
box int32
call string [mscorlib]System.String::Format(string, object)
...

Similarly this code: 同样这段代码:

d.data = "DataValue " + i;

is equivalent to this: 相当于:

d.data = String.Concat("DataValue ", i);

This uses the overload of String.Concat with parameters of type object so again this involves a boxing operation. 这使用的过载String.Concat参数类型为object以便再次,这涉及一个装箱操作。 This can be seen in the generated intermediate language code: 这可以在生成的中间语言代码中看到:

...
box int32
call string [mscorlib]System.String::Concat(object, object)
...

For better performance this approach avoids the boxing: 为了更好的性能,这种方法避免了拳击:

d.data = "DataValue " + i.ToString();

Now the intermediate language code doesn't include the box instruction and it uses the overload of String.Concat that takes two strings: 现在中间语言代码不包含box指令,它使用String.Concat两个字符串的String.Concat的重载:

...
call instance string [mscorlib]System.Int32::ToString()
call string [mscorlib]System.String::Concat(string, string)
...

On my machine: 在我的机器上:

... String.Format("DataValue {0}", i ) // ~1650ms
... String.Format("DataValue {0}", "") // ~1250ms
... new MyData {Id = i, Data = "DataValue {0}" + i} // ~1200ms

As Mark said, there's a boxing operation involved. 正如马克所说,有一个拳击行动。

For this specific case, when you get your DataValue based on your id, you could to create a get property or to override ToString() method to do that operation just when you need it. 对于这种特定情况,当您根据您的id获取DataValue时,您可以创建一个get属性或覆盖ToString()方法以在您需要时执行该操作。

public override string ToString()
{
    return "DataValue {0}" + Id;
}

There are a lot of things wrong with the above which will be affecting your results. 上面有很多问题会影响你的结果。 First, none of the comparisons you've done are equal. 首先,你所做的比较都不相同。 In both you have a list, and use Add, what you add to the list won't affect the time, changing the declaration of the List to var won't affect the time. 在你有一个列表,并使用添加,你添加到列表中的内容不会影响时间,将List的声明更改为var将不会影响时间。

I'm not convinced by the boxing argument put up by Mark, this can be a problem, but I'm pretty certain in the first case there is an implicit call to .ToString. 我不相信Mark提出的拳击论证,这可能是一个问题,但我很确定在第一种情况下有一个对.ToString的隐式调用。 This has its own overhead, and would be needed even if the int is boxed. 这有自己的开销,即使int被装箱也需要。

Format is quite an expensive operation. 格式是一项非常昂贵的操作。 The second version has a string concatenation which is probably cheaper than a .Format. 第二个版本有一个字符串连接,可能比.Format便宜。

The third is just expensive all the way. 第三个是一路昂贵。 Using a string builder like that is not efficient. 使用这样的字符串生成器效率不高。 Internally a stringbuilder is just a list. 在内部,stringbuilder只是一个列表。 When you do a .ToString on it you essentially do a big concat operation then. 当你在它上面执行.ToString时,你基本上会做一个大的concat操作。

The reason some of the operations might suddenly run really quickly if you take out a critical line is that the compile can optimise out bits of code. 如果你拿出一个关键线,一些操作可能会突然快速运行的原因是编译可以优化代码位。 If it seems to be doing the same thing over and over it might not do it (Gross over simplification). 如果它似乎一遍又一遍地做同样的事情,它可能不会这样做(粗略简化)。

Right, so here's my suggestion: 是的,所以这是我的建议:

The first version is probably the nearest to being "right" in my mind. 第一个版本可能是我心目中最接近“正确”的版本。 What you could do is defer some of the processing. 你能做的是推迟一些处理。 Take the object mydata and set a string property AND an int property. 获取对象mydata并设置字符串属性AND int属性。 Then only when you need to do the read of the string produce the output via a concat. 然后只有当你需要读取字符串时才通过concat产生输出。 Save that if you're going to repeat the print operation a lot. 如果您要重复打印操作,请保存。 It won't necessarilly be quicker in the way you expect. 它不会像你期望的那样更快。

Another major performance killer in this code is the List. 此代码中的另一个主要性能杀手是List。 Internally it stores the items in an array. 在内部,它将项目存储在数组中。 When you call Add, it checks if the new Item can fit into the array (EnsureCapacitiy). 当您调用Add时,它会检查新Item是否适合数组(EnsureCapacitiy)。 When it needs more room, it will create a NEW array with double the size, and then COPY the items from the old array into the new one. 当它需要更多空间时,它将创建一个大小为double的新数组,然后将旧数组中的项目复制到新数组中。 You can see all this going on if you check out List.Add in Reflector. 如果您在Reflector中查看List.Add,您可以看到所有这些。

So in your code with 1,000,000 items, it needs to copy the array some 25 times, and they're bigger each time. 因此,在包含1,000,000个项目的代码中,它需要将数组复制大约25次,并且每次都会更大。

If your change your code to 如果您将代码更改为

var list = new List<KeyValuePair<int, string>>(1000000);

you should see a dramatic increase in speed. 你应该看到速度的急剧增加。 Let us know how you fare! 让我们知道您的票价!

Regards, 问候,

GJ GJ

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM