In the book "Computer Systems: A Programmer's Perspective (3rd edition)"'s chapter 5, exercise 5.5 and 5.6 talked about Polynomial Evaluation:
It also gives two implementation poly()
and polyh()
, and says poly()
's CPE(Cycles Per Element) is 5.0 and polyh()
's CPE is 8.0, thus concludes poly()
run faster than polyh()
. **But with clang-12 or clang-14 on my ubuntu20.04, polyh()
is much faster, instead of what these exercises said. I'm confused. **
The Polynomial Evaluation implementations:
// the naive method
double poly(double a[], double x, long degree)
{
long i;
double result = a[0];
double xpwr = x;
for (i = 1; i <= degree; i++)
{
result += a[i] * xpwr;
xpwr = x * xpwr;
}
return result;
}
// the Horner's method
double polyh(double a[], double x, long degree)
{
long i;
double result = a[degree];
for (i = degree-1; i>=0; i--)
{
result = a[i] + x * result;
}
return result;
}
My compilation flags: -O1
. Full implementation (including timer) is: https://godbolt.org/z/3eW8Wzr7z
My time cost result:
polyh: took 2.318 ms, loop=10, avg = 0.232 ms
poly: took 78.980 ms, loop=10, avg = 7.898 ms
Why polyh
run faster with large CPE?
update: Based on the comments of @Passer By, I use the website quich-bench for time cost measurement, and with different array size, the benchmark result is different:
n = 1000, poly()
is faster ( https://quick-bench.com/q/EpDmf22VD_E0CvLN0-6TY_Ye8bU )
n = 10000, polyh()
is much faster ( https://quick-bench.com/q/yuzoVzz_KhWv1gJ-_j9wlZtfWVM )
I think there is some confusion regarding the statements in the book. The link you have provided clearly shows polyh()
to have less CPE than poly()
:
polyh(double*, double, long):
# skipping non-loop code...
mulsd xmm0, xmm1
addsd xmm0, qword ptr [rdi + 8*rsi - 16]
add rsi, -1
cmp rsi, 1
jg .LBB1_2
vs
poly(double*, double, long):
# skipping non-loop code...
movsd xmm3, qword ptr [rdi + 8*rax + 8]
mulsd xmm3, xmm2
addsd xmm0, xmm3
mulsd xmm2, xmm1
add rax, 1
cmp rsi, rax
jne .LBB0_2
Clearly polyh()
is more precise code in comparission with poly()
.
Now lets talk about optimization. First of all -O0
is used to disable optimization. -01
is the minimum optimizations.
But even if you throw optimization out of the window the code in polyh()
is optimized before even compilation. It has only 1 of each multiplication, addition and assigment while poly()
has 2 multiplications and assigments.
Clearly polyh()
is leaner and farter code.
UPDATE: After updated question here is what I found. I tested with same quick-bench but used GCC instead of CLANG as I was using on my computer, and thee results are still same. polyh()
wins even with 1000 iterations.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.