简体   繁体   English

这段代码花了太长时间?

[英]What takes too long on this code?

Trying to solve another SO question , I came up with the following algorithm which I thought is quite optmized. 试图解决另一个SO 问题 ,我提出了以下算法,我认为这个算法非常优化。 However while running DotNetBenchmark on all solutions, I was very surprised that my code was running on a whopping average of 387 ms compared to the ~ 20-30 ms some of the other answers acheived. 然而,在所有解决方案上运行DotNetBenchmark时,我感到非常惊讶的是,我的代码运行时间平均为387 ms相比之下,其他一些答案实现了~ 20-30 ms

[MethodImpl(MethodImplOptions.AggressiveInlining)]
int CalcMe(string input) // I used Marc Gravel's input generation method
{
  var operands = input.Split(' ');
  var j = 1; // operators index

  var result = int.Parse(operands[0]); // output

  // i = numbers index
  for (int i = 2; i < operands.Length; i += 2)
  {
    switch (operands[j])
    {
      case "+":
        result += int.Parse(operands[i]);
        break;
      case "-":
        result -= int.Parse(operands[i]);
        break;
      case "*":
        result *= int.Parse(operands[i]);
        break;
      case "/":
        try
        {
          result /= int.Parse(operands[i]);
          break;
        }
        catch
        {
          break; // division by 0.
        }

      default:
        throw new Exception("Unknown Operator");
    }

    j += 2; // next operator
  }

  return result;
}

Just by extracting the String.Split() to the caller Main() method, I lowered the execution to 110 ms , but that still does not solve the mystery since all other answers handle the input directly. 只需将String.Split()提取到调用者Main()方法,我String.Split()执行速度降低到110 ms ,但由于所有其他答案都直接处理输入,因此仍无法解决问题。

I am just trying to understand to perhaps change my way of thinking toward optimizations. 我只是想了解或许会改变我对优化的思考方式。 I couldn't see any keywords that I only use. 我看不到任何我只使用的关键字。 switch , for and int.Parse() are pretty much on every other solution. switchforint.Parse()几乎都是其他解决方案。

EDIT 1: Test input generation The input generation is copied form Marc answer on the original quetsion as below: 编辑1:测试输入生成输入生成从原始quetsion上的Marc answer复制,如下所示:

static string GenerateInput()
{
  Random rand = new Random(12345);
  StringBuilder input = new StringBuilder();
  string operators = "+-*/";
  var lastOperator = '+';
  for (int i = 0; i < 1000000; i++)
  {
    var @operator = operators[rand.Next(0, 4)];
    input.Append(rand.Next(lastOperator == '/' ? 1 : 0, 100) + " " + @operator + " ");
    lastOperator = @operator;
  }
  input.Append(rand.Next(0, 100));
  return input.ToString();
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]

Won't achieve almost anything here. 这里几乎没有任何成就。 Inlining is used when you want to tell to compiler to just copy and paste your code on multiple places to avoid unnecessary method invocations. 当您想告诉编译器只是将代码复制并粘贴到多个位置以避免不必要的方法调用时,使用内联。 And it's pretty damn smart to know when to do it on it's own in most of occasions. 在大多数情况下,知道什么时候自己做这件事真是太聪明了。

var operands = input.Split(' ');

Causes the JIT to go through the whole string, do a search, split a string and fill the array, which can take a long time. 使JIT遍历整个字符串,进行搜索,拆分字符串并填充数组,这可能需要很长时间。

switch (operands[j])

Switching on strings can also have an impact since it has to call equals on cases. 打开字符串也会产生影响,因为它必须在案例上调用equals。 You'd want to use simple types in switch if you're looking at performance(char for example). 如果你正在查看性能(例如char),你想在switch中使用简单类型。

int.Parse

This actually does a bunch of allocations and even deals with unsafe code. 这实际上做了一堆分配,甚至处理不安全的代码。 You can see the code for parsing here: 您可以在此处查看解析代码:

https://referencesource.microsoft.com/#mscorlib/system/number.cs,698 https://referencesource.microsoft.com/#mscorlib/system/number.cs,698

Or if the link goes down: 或者如果链接断开:

[System.Security.SecuritySafeCritical]  // auto-generated
internal unsafe static Int32 ParseInt32(String s, NumberStyles style, NumberFormatInfo info) {

    Byte * numberBufferBytes = stackalloc Byte[NumberBuffer.NumberBufferBytes];
    NumberBuffer number = new NumberBuffer(numberBufferBytes);
    Int32 i = 0;

    StringToNumber(s, style, ref number, info, false);

    if ((style & NumberStyles.AllowHexSpecifier) != 0) {
        if (!HexNumberToInt32(ref number, ref i)) { 
            throw new OverflowException(Environment.GetResourceString("Overflow_Int32"));
        }
    }
    else {
        if (!NumberToInt32(ref number, ref i)) {
            throw new OverflowException(Environment.GetResourceString("Overflow_Int32"));
        }
    }
    return i;           
}

[System.Security.SecuritySafeCritical]  // auto-generated
private unsafe static void StringToNumber(String str, NumberStyles options, ref NumberBuffer number, NumberFormatInfo info, Boolean parseDecimal) {

    if (str == null) {
        throw new ArgumentNullException("String");
    }
    Contract.EndContractBlock();
    Contract.Assert(info != null, "");
    fixed (char* stringPointer = str) {
        char * p = stringPointer;
        if (!ParseNumber(ref p, options, ref number, null, info , parseDecimal) 
                || (p - stringPointer < str.Length && !TrailingZeros(str, (int)(p - stringPointer)))) {
            throw new FormatException(Environment.GetResourceString("Format_InvalidString"));
        }
    }
}

I think comparing of strings much more complicated than comparing of chars 我认为比较字符串比比较字符要复杂得多

Below the key difference 下面是关键区别

switch (operands[j])
{
    case "+":
        ...

switch (cOperator)
{
    case '+':
       ...

Interesting problem! 有趣的问题! I was interested in implementing this for myself, and checking what I can come up with, as well as how it compares to other implementations. 我有兴趣为自己实现这个,并检查我能想出什么,以及它与其他实现的比较。 I did it in F# but since both F# and C# are strongly-typed CLR languages, and the insights gained below are (arguably) independent of C#, I hope you'll agree that the following is not quite off-topic. 我在F#中做过,但由于F#和C#都是强类型的CLR语言,并且下面获得的见解(可以说)独立于C#,我希望你们同意以下内容并非完全偏离主题。

First I needed a few functions for creating a suitable expression string (adapted from your posting), measuring time, and running a bunch of functions with the generated string: 首先,我需要一些函数来创建一个合适的表达式字符串(根据您的发布改编),测量时间,并使用生成的字符串运行一堆函数:

module Testbed =
    let private mkTestCase (n : int) =
        let next (r : System.Random) i = r.Next (0, i)
        let r = System.Random ()
        let s = System.Text.StringBuilder n
        let ops = "+-*/"
        (s.Append (next r 100), {1 .. n})
        ||> Seq.fold (fun s _ ->
            let nx = next r 100
            let op = ops.[next r (if nx = 0 then 3 else 4)]
            s.Append (" " + string op + " " + string nx))
        |> string

    let private stopwatch n f =
        let mutable r = Unchecked.defaultof<_>
        let sw = System.Diagnostics.Stopwatch ()
        sw.Start ()
        for i = 1 to n do r <- f ()
        sw.Stop ()
        (r, sw.ElapsedMilliseconds / int64 n)

    let runtests tests =
        let s, t = stopwatch 100 (fun () -> mkTestCase 1000000)
        stdout.Write ("MKTESTCASE\nTime: {0}ms\n", t)
        tests |> List.iter (fun (name : string, f) ->
            let r, t = stopwatch 100 (fun () -> f s)
            let w = "{0} ({1} chars)\nResult: {2}\nTime: {3}ms\n"
            stdout.Write (w, name, s.Length, r, t))

For a string of 1 million operations (around 4.9 million chars), the mkTestCase function ran in 317ms on my laptop. 对于100万次操作(大约490万个字符)的字符串, mkTestCase函数在我的笔记本电脑上运行了317ms。

Next I translated your function to F#: 接下来,我将您的功能翻译为F#:

module MethodsToTest =
    let calc_MBD1 (s : string) =
        let inline runop f a b =
            match f with
            | "+" -> a + b
            | "-" -> a - b
            | "*" -> a * b
            | "/" -> a / b
            | _ -> failwith "illegal op"
        let rec loop (ops : string []) r i j =
            if i >= ops.Length then r else
                let n = int ops.[i]
                loop ops (runop ops.[j] r n) (i + 2) (j + 2)
        let ops = s.Split ' '
        loop ops (int ops.[0]) 2 1

This ran in 488ms on my laptop. 这在我的笔记本电脑上跑了488ms。

Next I wanted to check if string matching is really that much slower than character matching: 接下来我想检查字符串匹配是否真的比字符匹配慢得多:

    let calc_MBD2 (s : string) =
        let inline runop f a b =
            match f with
            | '+' -> a + b
            | '-' -> a - b
            | '*' -> a * b
            | '/' -> a / b
            | _ -> failwith "illegal op"
        let rec loop (ops : string []) r i j =
            if i >= ops.Length then r else
                let n = int ops.[i]
                loop ops (runop ops.[j].[0] r n) (i + 2) (j + 2)
        let ops = s.Split ' '
        loop ops (int ops.[0]) 2 1

Common wisdom would say that character matching should be significantly faster, given that it involves only a primitive comparison instead of calculating a hash, but the above ran in 482ms on my laptop, so the difference between primitive character comparison, and comparing hashes of strings of length 1 is almost negligible. 一般的智慧会说,字符匹配应该明显更快,因为它只涉及原始比较而不是计算哈希,但上面在我的笔记本电脑上运行482ms,所以原始字符比较和比较字符串的哈希值之间的区别长度1几乎可以忽略不计。

Lastly I checked whether hand-rolling the number parsing would provide a significant saving: 最后,我检查了手动滚动数字解析是否会提供显着的节省:

    let calc_MBD3 (s : string) =
        let inline getnum (c : char) = int c - 48
        let parse (s : string) =
            let rec ploop r i =
                if i >= s.Length then r else
                    let c = s.[i]
                    let n = if c >= '0' && c <= '9'
                            then 10 * r + getnum c else r
                    ploop n (i + 1)
            ploop 0 0
        let inline runop f a b =
            match f with
            | '+' -> a + b
            | '-' -> a - b
            | '*' -> a * b
            | '/' -> a / b
            | _ -> failwith "illegal op"
        let rec loop (ops : string []) r i j =
            if i >= ops.Length then r else
                let n = parse ops.[i]
                loop ops (runop ops.[j].[0] r n) (i + 2) (j + 2)
        let ops = s.Split ' '
        loop ops (parse ops.[0]) 2 1

This ran in 361ms on my laptop, so the saving is significant but the function is still an order of magnitude slower than my own creation (see below), leading to the conclusion that the initial string splitting takes the bulk of the time. 这在我的笔记本电脑上运行了361ms,因此保存很重要,但功能仍然比我自己的创建慢一个数量级(见下文),从而得出结论:初始字符串拆分占用了大部分时间。

Just for comparison, I also translated the OP's function from the posting you referenced to F#: 为了比较,我还从您引用F#的帖子中翻译了OP的功能:

    let calc_OP (s : string) =
        let operate r op x =
            match op with
            | '+' -> r + x
            | '-' -> r - x
            | '*' -> r * x
            | '/' -> r / x
            | _ -> failwith "illegal op"
        let rec loop c n r =
            if n = -1 then
                operate r s.[c + 1] (int (s.Substring (c + 3)))
            else
                operate r s.[c + 1] (int (s.Substring (c + 3, n - (c + 2))))
                |> loop n (s.IndexOf (' ', n + 4))
        let c = s.IndexOf ' '
        loop c (s.IndexOf (' ', c + 4)) (int (s.Substring (0, c)))

This ran in 238ms on my laptop, so using substrings is not as slow as splitting the string but still it is far from optimal. 这在我的笔记本电脑上运行了238毫秒,因此使用子串并不像分割字符串那么慢,但它仍然远非最佳。

Finally my own implementation of an expression interpreter, taking into account that the fastest way of processing is doing it manually character by character, iterating the string only once, and that heap allocation (by way of creating new objects, such as strings or arrays) should be avoided inside the loop as much as possible: 最后是我自己的表达式解释器实现,考虑到最快的处理方式是逐个字符地手动完成,只迭代字符串一次,以及堆分配(通过创建新对象,如字符串或数组)应该尽可能避免在循环内部:

    let calc_Dumetrulo (s : string) =
        let inline getnum (c : char) = int c - 48
        let inline isnum c = c >= '0' && c <= '9'
        let inline isop c =
            c = '+' || c = '-' || c = '*' || c = '/'
        let inline runop f a b =
            match f with
            | '+' -> a + b
            | '-' -> a - b
            | '*' -> a * b
            | '/' -> a / b
            | _ -> failwith "illegal op"
        let rec parse i f a c =
            if i >= s.Length then
                if c = -1 then a else runop f a c
            else
                let k, j = s.[i], i + 1
                if isnum k then
                    let n = if c = -1 then 0 else c
                    parse j f a (10 * n + getnum k)
                elif isop k then parse j k a c
                elif c = -1 then parse j f a c
                else parse j f (runop f a c) -1
        parse 0 '+' 0 -1

This ran in a satisfactory 28ms on my laptop. 这在我的笔记本电脑上运行了28毫秒。 You can express this the same way in C#, except for the tail-recursion, which should be expressed by a for or while loop: 您可以在C#中以相同的方式表达,除了尾递归,它应该由forwhile循环表示:

    static int RunOp(char op, int a, int b)
    {
        switch (op)
        {
            case '+': return a + b;
            case '-': return a - b;
            case '*': return a * b;
            case '/': return a / b;
            default: throw new InvalidArgumentException("op");
        }
    }

    static int Calc_Dumetrulo(string s)
    {
        int a = 0, c = -1;
        char op = '+';
        for (int i = 0; i < s.Length; i++)
        {
            char k = s[i];
            if (k >= '0' && k <= '9')
                c = (c == -1 ? 0 : 10 * c) + ((int)k - 48);
            else if (k == '+' || k == '-' || k == '*' || k == '/')
                op = k;
            else if (c == -1) continue;
            else
            {
                a = RunOp(op, a, c);
                c = -1;
            }
        }
        if (c != -1) a = RunOp(op, a, c);
        return a;
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM