简体繁体 English

如何使用 GHC 在近机器级别可靠地影响生成的代码？

[英]How to reliably influence generated code at near machine level using GHC?

原文 2018-08-29 23:17:26 8 1 haskell/ ghc/ micro-optimization

While this may sound as theoretical question, suppose I decide to invest and build a mission-critical application written in Haskell.虽然这听起来像是一个理论问题，但假设我决定投资并构建一个用 Haskell 编写的关键任务应用程序。 A year later I find that I absolutely need to improve performance of some very thin bottleneck and this will require optimizing memory access close to raw machine capabilities.一年后，我发现我绝对需要提高一些非常细小的瓶颈的性能，这将需要优化接近原始机器能力的内存访问。

Some assumptions:一些假设：

It isn't realtime system - occasional latency spikes are tolerable (from interrupts, thread scheduling irregularities, occasional GC etc.)它不是实时系统 - 偶尔的延迟峰值是可以容忍的（来自中断、线程调度异常、偶尔的 GC 等）
It isn't a numeric problem - data layout and cache-friendly access patterns are most important (avoiding pointer chasing, reducing conditional jumps etc.)这不是一个数字问题——数据布局和缓存友好的访问模式是最重要的（避免指针追逐，减少条件跳转等）
Code may be tied to specific GHC release (but no forking)代码可能与特定的 GHC 版本相关联（但没有分叉）
Performance goal requires inplace modification of pre-allocated offheap arrays taking alignment into account (C strings, bit-packed fields etc.)性能目标需要对预先分配的堆外数组进行就地修改，同时考虑对齐（C 字符串、位打包字段等）
Data is statically bounded in arrays and allocations are rarely if ever needed数据在数组中是静态有界的，如果需要，很少分配

What mechanisms does GHC offer to perfom this kind of optimization? GHC 提供什么机制来执行这种优化？ By saying reliably I mean that if source change causes code to no longer perform, it is correctible in source code without rewriting it in assembly.可靠地说，我的意思是，如果源代码更改导致代码不再执行，则可以在源代码中更正，而无需在程序集中重写。

Is it already possible using GHC-specific extensions and libraries?是否已经可以使用 GHC 特定的扩展和库？
Would custom FFI help avoid C calling convention overhead?自定义 FFI 是否有助于避免 C 调用约定开销？
Could a special purpose compiler plugin do it through a restricted source DSL?特殊用途的编译器插件可以通过受限制的源 DSL 来完成吗？
Could source code generator from a "high-level" assembly (LLVM?) be solution?来自“高级”程序集（LLVM？）的源代码生成器可以成为解决方案吗？

1 个解决方案

It sounds like you're looking for unboxed arrays.听起来您正在寻找未装箱的数组。 "unboxed" in haskell-land means "has no runtime heap representation". haskell-land 中的“未装箱”意味着“没有运行时堆表示”。 You can usually learn whether some part of your code is compiled to an unboxed loop (a loop that performs no allocation), say, by looking at the core representation (this is a very haskell-like language, that's the first stage in compilation).您通常可以通过查看核心表示（这是一种非常类似 haskell 的语言，这是编译的第一阶段）来了解代码的某些部分是否被编译为未装箱循环（不执行分配的循环） . So eg you might see Int# in the core output which means an integer which has no heap representation (it's gonna be in a register).例如，您可能会在核心输出中看到Int# ，这意味着一个没有堆表示的整数（它将在寄存器中）。

When optimizing haskell code we regularly look at core and expect to be able to manipulate or correct for performance regressions by changing the source code (eg adding a strictness annotation, or fiddling with a function such that it can be inlined).在优化haskell 代码时，我们会定期查看核心，并期望能够通过更改源代码（例如添加严格注释，或摆弄一个可以内联的函数）来操纵或纠正性能回归。 This isn't always fun, but will be fairly stable especially if you are pinning your compiler version.这并不总是很有趣，但会相当稳定，尤其是在您固定编译器版本时。

Back to unboxed arrays: GHC exposes a lot of low-level primops in GHC.Prim, in particular it sounds like you want mutable unboxed arrays ( MutableByteArray ).回到未装箱数组：GHC 在 GHC.Prim 中公开了许多低级 primop，特别是听起来您想要可变的未装箱数组 ( MutableByteArray )。 The primitive package exposes these primops behind a slightly safer, friendlier API and is what you should use (and depend on if writing your own library). primitive包将这些 primops 暴露在一个稍微更安全、更友好的 API 后面，是您应该使用的（取决于是否编写自己的库）。

There are many other libraries that implement unboxed arrays, such as vector , and which are built on MutableByteArray , but the point is that operations on that structure generate no garbage and likely compile down to pretty predictable machine instructions.还有许多其他实现未装箱数组的库，例如vector ，它们构建在MutableByteArray ，但关键是该结构上的操作不会产生垃圾，并且可能会编译成非常可预测的机器指令。

You might also like to check out this technique if you're doing numeric work and want to use a particular instruction or implement some loop directly in assembly.如果您正在执行数字工作并希望使用特定指令或直接在汇编中实现某些循环，您可能还想查看此技术。

GHC also has a very powerful FFI, and you can research about how to write portions of your program in C and interop; GHC 还有一个非常强大的 FFI，你可以研究如何用 C 和互操作编写你的程序的一部分； haskell supports pinned arrays among other structures for this purpose.为此，haskell 在其他结构中支持固定数组。

If you need more control than those give you then haskell is likely the wrong language.如果你需要更多的控制权，那么 haskell 可能是错误的语言。 It's impossible to tell from your description if this is the case for your problem (Your requirements seem contradictory: you need to be able to write a carefully cache-tuned algorithm, but arbitrary GC pauses are okay?).从您的描述中无法判断您的问题是否属于这种情况（您的要求似乎自相矛盾：您需要能够编写仔细的缓存调整算法，但任意 GC 暂停都可以吗？）。

One last note: you can't rely on GHC's native code generator to perform any of the low-level strength reduction optimizations that eg GCC performs (GHC's NCG will probably never ever know about bit-twiddling hacks, autovectorization, etc. etc.).最后一个注意事项：您不能依赖 GHC 的本机代码生成器来执行任何低级强度降低优化，例如 GCC 执行的（GHC 的 NCG 可能永远不会知道有关位操作黑客、自动向量化等） . Instead you can try the LLVM backend, but whether you see a speedup in your program is by no means guaranteed.相反，您可以尝试使用 LLVM 后端，但不能保证您是否在程序中看到加速。