为什么在 CPython 中 float * int 乘法比 int * float 快？

Question

Basically, the expression 0.4 * a is consistently, and surprisingly, significantly faster than a * 0.4 .基本上，表达式0.4 * a始终且令人惊讶地明显快于a * 0.4 。 a being an integer. a是 integer。 And I have no idea why.我不知道为什么。

I speculated that it is a case of a LOAD_CONST LOAD_FAST bytecode pair being "more specialized" than the LOAD_FAST LOAD_CONST and I would be entirely satisfied with this explanation, except that this quirk seems to apply only to multiplications where types of multiplied variables differ.我推测这是一个LOAD_CONST LOAD_FAST字节码对比 LOAD_FAST LOAD_CONST “更专业”的LOAD_FAST LOAD_CONST ，我会对这个解释完全满意，除了这个怪癖似乎只适用于乘法变量类型不同的乘法。 (By the way, I can no longer find the link to this "bytecode instruction pair popularity ranking" I once found on github, does anyone have a link?) （顺便说一句，我曾经在github上找到的这个“字节码指令对流行度排名”的链接已经找不到了，有人有链接吗？）

Anyway, here are the micro benchmarks:无论如何，这里是微观基准：

$ python3.10 -m pyperf timeit -s"a = 9" "a * 0.4"
Mean +- std dev: 34.2 ns +- 0.2 ns

$ python3.10 -m pyperf timeit -s"a = 9" "0.4 * a"
Mean +- std dev: 30.8 ns +- 0.1 ns

$ python3.10 -m pyperf timeit -s"a = 0.4" "a * 9"
Mean +- std dev: 30.3 ns +- 0.3 ns

$ python3.10 -m pyperf timeit -s"a = 0.4" "9 * a"
Mean +- std dev: 33.6 ns +- 0.3 ns

As you can see - in the runs where the float comes first (2nd and 3rd) - it is faster.正如您所看到的 - 在浮动首先出现的运行（第 2 次和第 3 次）中 - 它更快。
So my question is where does this behavior come from?所以我的问题是这种行为从何而来？ I'm 90% sure that it is an implementation detail of CPython, but I'm not that familiar with low level instructions to state that for sure.我 90% 确定它是 CPython 的实现细节，但我对 state 的低级指令不太熟悉。

Answer 1

It's CPython's implementation of the BINARY_MULTIPLY opcode.这是 CPython 对BINARY_MULTIPLY操作码的实现。 It has no idea what the types are at compile-time, so everything has to be figured out at run-time.它不知道编译时的类型是什么，所以一切都必须在运行时计算出来。 Regardless of what a and b may be, BINARY_MULTIPLY ends up inoking a.__mul__(b) .不管a和b是什么， BINARY_MULTIPLY最终都会调用a.__mul__(b) 。

When a is of int type int.__mul__(a, b) has no idea what to do unless b is also of int type.当a是 int 类型时， int.__mul__(a, b)不知道该怎么做，除非b也是 int 类型。 It returns Py_RETURN_NOTIMPLEMENTED (an internal C constant).它返回Py_RETURN_NOTIMPLEMENTED （一个内部 C 常量）。 This is in longobject.c 's CHECK_BINOP macro.这是在longobject.c的CHECK_BINOP宏中。 The interpreter sess that, and effectively says "OK, a.__mul__ has no idea what to do, so let's give b.__rmul__ a shot at it".解释器对此进行了分析，并有效地说“好的， a.__mul__不知道该做什么，所以让我们b.__rmul__ ”。 None of that is free - it all takes time.这些都不是免费的——这一切都需要时间。

float.__mul__(b, a) (same as float.__rmul__ ) does know what to do with an int (converts it to float first), so that succeeds. float.__mul__(b, a) （与float.__rmul__相同）确实知道如何处理 int （首先将其转换为 float），因此成功。

But when a is of float type to begin with, we go to float.__mul__ first, and that's the end of it.但是当a开始是 float 类型时，我们 go 先到float.__mul__ ，到此结束。 No time burned figuring out that the int type doesn't know what to do.没有时间花时间弄清楚 int 类型不知道该做什么。

The actual code is quite a bit more involved than the above pretends, but that's the gist of it.实际的代码比上面假装的要复杂得多，但这就是它的要点。

为什么在 CPython 中 float * int 乘法比 int * float 快？

问题描述

1 个解决方案

解决方案1
5 已采纳 2022-08-11 20:59:16

为什么在 CPython 中 float * int 乘法比 int * float 快？

问题描述

1 个解决方案

解决方案1 5 已采纳 2022-08-11 20:59:16

解决方案1
5 已采纳 2022-08-11 20:59:16