简体   繁体   English

C ++函数的执行时间矛盾

[英]Execution time for C++ Functions contradict

I am have a main function that calls two user defined functions. 我有一个调用两个用户定义函数的主要函数。 Both the functions perform the same task( in this a simple selection with a selectivity factor of 50%) in a different way(using if else in one function and without if else in another function). 这两个功能以不同的方式执行同一任务(在此简单选择中,选择因子为50%)(在一个功能中使用if,在另一个功能中使用if)。 I measure the execution time of both functions. 我测量两个函数的执行时间。

    int main()
    {
     clock_t t;
     period=clock();
     func1();
     period=clock()-period;
     print period

     period=clock();
     func2();
     period=clock()-period;
     print period
   }

   void func1()
   {
    int A[100000],B[100000],in=0;
    for (i=0;i<100000;i++)
    {
      A[i]=i;
    }
    for (i=0;i<100000;i++)
    {
      if(A[i]==3)
      B[in++]=i;
    }
   }

Func2 is similar to this except i replace the if by a non-branching statement. Func2与此类似,除了我用一个非分支语句替换了if。

When i execute the program, which function am calling first is taking more time. 当我执行程序时,首先调用哪个函数会花费更多时间。 In above case, func1 is taking more time. 在上述情况下,func1需要更多时间。 If i call func2 first followed by func1 , then func2 is taking more time. 如果我先调用func2,然后再调用func1,那么func2将花费更多时间。 I really dont understand the logic behind this. 我真的不明白这背后的逻辑。

Can anyone explain please. 谁能解释。

If Func2 is the same as Func1 except for the branch in the second loop, my guess is that for the first function called, the operating system has to commit pages for the stack, while the second function can just use the pages already committed. 如果Func2Func1相同,除了第二个循环中的分支,我猜是对于第一个函数,操作系统必须为堆栈提交页面,而第二个函数只能使用已经提交的页面。

Operating systems tend to only commit memory on demand, so that more programs can fit into physical memory. 操作系统往往只按需提交内存,以便更多程序可以放入物理内存。 For a new thread, Windows commits (actually backs the virtual memory addresses with physical memory pages) the amount of stack specified in the executable headers. 对于新线程,Windows提交(实际上使用物理内存页备份虚拟内存地址)可执行标头中指定的堆栈量。 It then commits further pages of stack only when the process touches the page. 然后,仅当进程触摸页面时,才会提交堆栈的其他页面。 It continues to do this until the specified reserve size is reached, after which it will generate Stack Overflow exceptions. 它将继续执行此操作,直到达到指定的保留大小为止,此后它将生成堆栈溢出异常。 The defaults used by Microsoft's link.exe are to reserve 1MB of virtual address space for each stack, and commit a single page (4kB). Microsoft的link.exe使用的默认值是为每个堆栈保留1MB的虚拟地址空间,并提交一个页面(4kB)。

When the process does touch the page, the processor tries to look up the virtual address in the Translation Look-aside Buffer. 当进程确实触及页面时,处理器将尝试在转换后备缓冲区中查找虚拟地址。 It won't be found - a TLB 'miss' - so it will then try to look it up in the process's page tables. 找不到它-TLB'miss'-因此它将尝试在进程的页表中查找它。 The page table will contain an invalid entry, so it will raise a page fault exception. 页面表将包含无效条目,因此将引发页面错误异常。

The operating system's page fault handler looks at the page table entry and determines that this belongs to a thread stack, so it finds an empty page of memory, modifies its page frame database to mark that this page now belongs to this process, modifies the page table to point to the new page, then dismisses the exception. 操作系统的页面错误处理程序查看页面表条目并确定它属于线程堆栈,因此它将找到一个空的内存页面,修改其页面框架数据库以标记此页面现在属于该进程,并修改该页面表指向新页面,然后消除该异常。 The processor restarts execution with the instruction that faulted. 处理器以错误的指令重新开始执行。

Processor exception handling is slow . 处理器异常处理很 A bunch of context information gets pushed onto the stack so the processor knows where to return to, the pipelines stall, the TLB may not have the OS code's location in it any more, the instruction and data caches likely don't have the OS code or data in them. 一堆上下文信息被压入堆栈,因此处理器知道要返回到哪里,管道停滞了,TLB可能不再有OS代码的位置,指令和数据缓存可能没有OS代码或其中的数据。

Assuming int is 32-bit on your compiler, you have 400,000 bytes in the first array and another 400,000 in the second. 假设int在编译器上为32位,则第一个数组中有400,000字节,第二个数组中又有400,000字节。 The first loop will generate (probably) 98 page faults to create the first array, then the second might generate a further fault to set B[0] to 3. 第一个循环将生成(可能)98个页面错误以创建第一个数组,然后第二个循环可能会生成另一个错误以将B[0]设置为3。

It can also depend on your compiler, operating system, and the compiler options. 它也可能取决于您的编译器,操作系统和编译器选项。 With large arrays declared on the stack like this, some compilers will generate a 'stack probe' at the beginning of the function to ensure that the stack is committed properly. 像这样在堆栈上声明了大数组时,某些编译器将在函数的开头生成一个“堆栈探针”,以确保正确提交了堆栈。 Windows requires that the process commit its stack page-by-page in the correct order, so Microsoft's compiler inserts a call to the _chkstk function at the start of the function. Windows要求进程按正确的顺序逐页提交其堆栈,因此Microsoft的编译器在函数开始处插入对_chkstk函数的调用。 This deliberately reads from each stack page in turn, so that the OS commits the page. 这是有意依次从每个堆栈页面读取的,以便操作系统提交该页面。 This is documented in KB 100775 . KB 100775中对此进行了说明

Linux reportedly just commits stack pages on-demand , and doesn't require that the program commits pages in the right order. 据报道, Linux 仅按需提交堆栈页面 ,而不要求程序按正确的顺序提交页面。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM