[英]log10() performance on Visual Studio 2015 a lot slower than Visual Studio 2013 for x86
We have ported a VS2013 C++/MFC application to VS2015 and are having some rather disturbing issues with the performance and code generated by the VS2015 compiler. 我们已将VS2013 C ++ / MFC应用程序移植到VS2015,并且在VS2015编译器生成的性能和代码方面存在一些相当令人不安的问题。
Note this is for x86. 请注意,这是针对x86的。
It is magnitudes slower on log10() calls. log10()调用的速度要慢一些。 When profiling a Release build using CPU sampling, we see that these calls take up a lot more time than they did before.
使用CPU采样分析版本构建时,我们发现这些调用比以前占用了更多的时间。 Going from eg 49 samples on the same run for VS2013 to a whopping 7545 samples for the same run in VS2015.
从VS2013的同一次运行中的49个样本到VS2015中相同运行的高达7545个样本。 This means this function goes from 0.6% of CPU load to 50% for the application in question.
这意味着此功能从CPU负载的0.6%变为有问题的应用程序的50%。
In VS2013 profiler shows: 在VS2013中,探查器显示:
Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples %
__libm_sse2_log10 49 49 0.61 0.61
In VS2015 profiler shows: 在VS2015中,分析器显示:
Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples %
___sse2_log102 7,545 7,545 50.43 50.43
Why a different function name? 为什么不同的功能名称?
We have looked briefly at the generated assembly for log10. 我们简要介绍了生成的log10程序集。 On VS2013 this forwards to
disp_pentium4.inc
and log10_pentium4.asm
. 在VS2013上,它转发到
disp_pentium4.inc
和log10_pentium4.asm
。 On VS2015 this is different. 在VS2015上,这是不同的。 It seems VS2015 goes back to
__libm_sse2_log10
in Debug. 似乎VS2015在Debug中回到
__libm_sse2_log10
。
Could the __sse2_log102
be the cause of this performance difference alone? __sse2_log102
可能__sse2_log102
是这种性能差异的原因吗? We have checked that results output from functions calling these are within expected floating point differences. 我们检查了调用它们的函数的结果输出是否在预期的浮点差异内。
We are compiling with target v140_xp and have the following compile options: 我们正在使用目标v140_xp进行编译,并具有以下编译选项:
/Yu"stdafx.h" /MP /GS- /GL /analyze- /W4 /wd"4510" /wd"4610" /Zc:wchar_t /Z7 /Gm- /Ox /Ob2 /Zc:inline /fp:fast /D "WINVER=0x0501" /D "WIN32" /D "_WINDOWS" /D "NDEBUG" /D "_CRT_SECURE_NO_WARNINGS" /D "_CRT_SECURE_NO_DEPRECATE" /D "_SCL_SECURE_NO_WARNINGS" /D "_USING_V110_SDK71_" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /GR /arch:SSE2 /Gd /Oy /Oi /MT
Also shown here when viewing properties: 此处还显示了查看属性时:
All project settings are the same for both VS2013 and VS2015. VS2013和VS2015的所有项目设置都相同。 Note we are using SSE2 and have floating point model set to fast.
注意我们使用SSE2并将浮点模型设置为快速。
Has anyone encountered the same issue and know how to fix this? 有没有人遇到过同样的问题,知道如何解决这个问题?
Here my comment as an answer. 在这里我的评论作为答案。
It appears that VS2015 has changed the implementation of log10
in release builds, where it calls this new __sse2_log102
function instead of the old __libm_sse2_log10
and that this new implementation is the cause of a huge performance difference. 似乎VS2015在发布版本中更改了
log10
的实现,它调用了这个新的__sse2_log102
函数而不是旧的__libm_sse2_log10
,并且这个新实现是导致巨大性能差异的原因。
The fix for us in this case was to call an implementation available in Intels Performance Primitives (IPP) library. 在这种情况下,我们的解决方案是调用Intels Performance Primitives(IPP)库中的实现。 Eg instead of calling:
例如,而不是呼叫:
return log10(v);
Call this instead: 请改为:
double result;
ippsLog10_64f_A53(&v, &result, 1);
return result;
This resulted in the performance issue to disappear, in fact it was slightly faster using an old IPP 7.0 release. 这导致性能问题消失,实际上使用旧的IPP 7.0版本稍微快一些。 Not all can use and pay for IPP, though, so we hope Microsoft fixes this.
然而,并非所有人都可以使用和支付IPP,所以我们希望微软能够解决这个问题。
Below is the version of VS2015 that has shown this issue. 以下是显示此问题的VS2015版本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.