简体   繁体   English

C 或 C++ 中乘法和除法的较高部分?

[英]Higher part of multiply and division in C or C++?

When I multiply a pair of 4 bytes integers in assembly, the lower part of the result is in EAX and the higher part in EDX.当我在汇编中将一对 4 字节整数相乘时,结果的较低部分在 EAX 中,较高部分在 EDX 中。 If I am in C or C++ and I want to get the higher part, is it possible whithout use of inline assembly?如果我使用 C 或 C++ 并且我想获得更高的部分,是否可以不使用内联汇编?

Is in the same way possible to get the integer division result from EAX and the modulus result from EDX without repeating the division in C or C++?是否可以以相同的方式从 EAX 获得整数除法结果和从 EDX 获得模数结果,而无需在 C 或 C++ 中重复除法? I actually only know to do first a/b and then a%b , while in assembler both results are given in the same operation.我实际上只知道先做a/b然后a%b ,而在汇编程序中,两个结果都在同一个操作中给出。

You can do it easily in C this way:您可以通过以下方式在 C 中轻松完成:

#include <stdint.h>

uint32_t a, b;  // input
uint64_t val = (uint64_t)a * b;
uint32_t high = val >> 32, low = val;

Leave it to the compiler to produce the best possible code.把它留给编译器来生成最好的代码。 Modern optimizers are really good at it.现代优化器非常擅长它。 Hand coded assembly often looks better but performs worse.手工编码的程序集通常看起来更好,但性能更差。

As commented by Pete Becker, the above relies on availability of the types uint32_t and uint64_t .正如 Pete Becker 所评论的,上述依赖于uint32_tuint64_t类型的可用性。 If you insist on die hard portability (say you are programming on a DS9K ), you may instead use the types uint_least32_t and uint_least64_t or uint_fast32_t and uint_fast64_t that are always available under C99, but you need an extra mask, that will be optimized out if not required:如果您坚持顽固的可移植性(假设您在DS9K上编程),您可以改为使用 C99 下始终可用的uint_least32_tuint_least64_tuint_fast32_tuint_fast64_t类型,但您需要一个额外的掩码,如果不需要:

#include <stdint.h>

uint_fast32_t a, b;  // input
uint_fast64_t val = (uint_fast64_t)a * b;
uint_fast32_t high = (val >> 32) & 0xFFFFFFFF, low = val & 0xFFFFFFFF;

Regarding division, you can use the C99 library functions div , ldiv or lldiv to perform signed division and remainder operations in one call.关于除法,您可以使用 C99 库函数divldivlldiv在一次调用中执行有符号除法和余数运算。 The division/modulo combination will be implemented in one operation if possible on the target architecture for the specific operand types.如果可能,除法/模组合将在特定操作数类型的目标体系结构上在一个操作中实现。

It may be more efficient to write both expressions and rely on the compiler to detect the pattern and produce code that uses a single IDIV opcode:编写两个表达式并依靠编译器检测模式并生成使用单个 IDIV 操作码的代码可能更有效:

struct divmod_t { int quo, rem; };
struct divmod_t divmod(int num, int denom) {
    struct divmod_t r = { num / denom, num % denom };
    return r;
}

Testing on Matt Godbolt's compiler explorer shows both clang and gcc generate a single idiv instruction for this code at -O3 .Matt Godbolt 的编译器资源管理器进行的测试显示,clang 和 gcc 都在-O3处为该代码生成了一条idiv指令。

You can turn one of these divisions into a multiplication:您可以将这些除法之一变成乘法:

struct divmod_t { int quo, rem; };
struct divmod_t divmod2(int num, int denom) {
    struct divmod_t r;
    r.quo = num / denom;
    r.rem = num - r.quo * denom;
    return r;
}

Note that the above functions do not check for potential overflow, which results in undefined behavior.请注意,上述函数不检查潜在的溢出,这会导致未定义的行为。 Overflow occurs if denom = 0 and if num = INT_MIN and denom = -1 .如果denom = 0num = INT_MINdenom = -1则会发生溢出。

You don't deal with the implementation details in C or C++.您无需处理 C 或 C++ 中的实现细节。 That's the whole point.这就是重点。 If you want the the most significant bytes, simple use the language.如果您想要最重要的字节,请简单地使用该语言。 Right shift >> is designed to do that.右移>>旨在做到这一点。 Something like:就像是:

uint64_t i;
uint32_t a;
uint32_t b;
// input a, b and set i to a * b
// this should be done with (thanks to @nnn, pls see comment below):
// i = a; i *= b;
uint64_t msb = i >> 32;

For multiplication, only Forth among widely known languages (higher than assembler) has an explicit multiplication of N*N bits to 2N-bit result (the words M* , UM* ).对于乘法,在广为人知的语言(高于汇编程序)中,只有 Forth 具有 N*N 位到 2N 位结果的显式乘法(单词M* , UM* )。 C, Fortran, etc. don't have it. C、Fortran等都没有。 Yes, this sometimes leads into misoptimization.是的,这有时会导致优化错误。 For example, on x86_32, getting a 64-bit product requires either converting a number to 64-bit one (can cause library call instead of mul command), or an explicit inline assembly call (simple and efficient in gcc and clones, but not always in MSVC and other compilers).例如,在 x86_32 上,获得 64 位产品需要将数字转换为 64 位数字(可能导致库调用而不是mul命令),或显式内联汇编调用(在 gcc 和克隆中简单而高效,但不是总是在 MSVC 和其他编译器中)。

In my tests on x86_32 (i386), a modern compiler is able to convert code like在我对 x86_32 (i386) 的测试中,现代编译器能够像这样转换代码

#include <stdint.h>
int64_t mm(int32_t x, int32_t y) {
  return (int64_t) x * y;
}

to simple "imull" instruction without a library call;无需库调用的简单“imull”指令; clang 3.4 (-O1 or higher) and gcc 4.8 (-O2 or higher) satisfies this, and I guess this won't stop ever. clang 3.4(-O1 或更高)和 gcc 4.8(-O2 或更高)满足这一点,我想这不会停止。 (With lesser optimization level, a second useless multiplication is added.) But one can't guarantee this for any other compiler without a real test. (使用较低的优化级别,添加了第二个无用的乘法。)但是如果没有真正的测试,就不能保证任何其他编译器都能做到这一点。 With gcc on x86, the following will work even without optimization:使用 x86 上的 gcc,即使没有优化,以下内容也能正常工作:

int64_t mm(int32_t x, int32_t y) {
  int64_t r;
  asm("imull %[s]" : "=A" (r): "a" (x), [s] "bcdSD" (y): "cc");
  return r;
}

The same trend, with similar commands, is true for nearly all modern CPUs.几乎所有现代 CPU 都具有相同的趋势和类似的命令。

For division (like 64-bit dividend by 32-bit divisor to 32-bit quotient and remainders), this is more complicated.对于除法(如 64 位被除数除以 32 位除数到 32 位商和余数),这更复杂。 There are library functions like `lldiv' but they are only for signed division;有像`lldiv'这样的库函数,但它们只用于有符号除法; there are no unsigned equivalents.没有无符号的等价物。 Also, they are library calls with the all respective cost.此外,它们是具有所有相应成本的库调用。 But, the issue here is that many modern architectures doesn't have this kind of division.但是,这里的问题是许多现代架构没有这种划分。 For example, it's explicitly excluded from ARM64 and RISC-V.例如,它被明确排除在 ARM64 和 RISC-V 之外。 For them, one have to emulate long division using shorter one (eg divide 2**(N-1) by a dividend but then double the result and tune its remainder).对于他们来说,必须使用较短的除法来模拟长除法(例如,将 2**(N-1) 除以被除数,然后将结果加倍并调整其余数)。 For those having mixed-length division (x86, M68k, S/390, etc.), a one-line assembly inliner is rather good if you are sure it won't overflow :)对于那些具有混合长度的除法(x86、M68k、S/390 等),如果您确定它不会溢出,那么单行组装内联是相当不错的 :)

Some architectures lacks division support at all (older Sparc, Alpha), and that's a standard library task to support such operations.一些架构根本不支持除法(较旧的 Sparc、Alpha),这是支持此类操作的标准库任务。

Anyway, a standard library provides all needed operations unless you require the highest precision (eg x86_64 can divide 128-bit dividend by 64-bit divisor, but this isn't supported by C library).无论如何,标准库提供了所有需要的操作,除非您需要最高精度(例如,x86_64 可以将 128 位被除数除以 64 位除数,但 C 库不支持此操作)。

I think the most elaborated and accessible example of these approaches for different architectures is GMP library .我认为针对不同架构的这些方法的最详细和最易访问的示例是GMP 库 It's much more advanced than for your question, but you can dig examples for division by a single limb for different architectures, it implements proper chaining even if architecture doesn't support it directly.它比您的问题要先进得多,但是您可以挖掘不同架构的单臂划分示例,即使架构不直接支持它,它也可以实现正确的链接。 Also it will suffice very most needs for arbitrary long number arithmetic, despite with some overhead.尽管有一些开销,但它也足以满足任意长数算术的大多数需求。

NB if you call div -like instruction explicitly, it's your responsibility to check for overflows.注意,如果您显式调用div的指令,则检查溢出是您的责任。 It's more trickier in signed case than in unsigned one;签名的情况下比未签名的情况更棘手; for example, division of -2147483648 by -1 crashes a x86-based program, even if written in C.例如,-2147483648 除以 -1 会使基于 x86 的程序崩溃,即使是用 C 编写的。

UPDATE[2020-07-04]: with GCC Integer overflow builtins , one can use multiplication using mixed precision, like:更新 [2020-07-04]:使用 GCC 整数溢出内置函数,可以使用混合精度的乘法,例如:

#include <stdint.h>
int64_t mm(int32_t x, int32_t y) {
  int64_t result;
  __builtin_mul_overflow(x, y, &result);
  return result;
}

this is translated by both GCC and Clang to optimal form in most of cases.在大多数情况下,这会被 GCC 和 Clang 转换为最佳形式。 I hope other compilers and even standards will eventually adopt this.我希望其他编译器甚至标准最终都会采用这一点。

对于除法,完全可移植的解决方案使用库函数divldivlldiv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM