GHC生成冗余核心操作

Question

I have the following program for converting 6bit ASCII to binary format. 我有以下程序将6位ASCII转换为二进制格式。

ascii2bin :: Char -> B.ByteString
ascii2bin = B.reverse . fst . B.unfoldrN 6 decomp . to6BitASCII -- replace to6BitASCII with ord if you want to compile this
    where decomp n = case quotRem n 2 of (q,r) -> Just (chr r,q)

bs2bin :: B.ByteString -> B.ByteString
bs2bin = B.concatMap ascii2bin

this produces the following core segment: 这产生了以下核心部分：

Rec {
$wa
$wa =
  \ ww ww1 ww2 w ->
    case ww2 of wild {
      __DEFAULT ->
        let {
          wild2
          wild2 = remInt# ww1 2 } in
        case leWord# (int2Word# wild2) (__word 1114111) of _ { 
          False -> (lvl2 wild2) `cast` ...;                                                                                   
          True ->
            case writeWord8OffAddr#
                   ww 0 (narrow8Word# (int2Word# (ord# (chr# wild2)))) w
            of s2 { __DEFAULT ->
            $wa (plusAddr# ww 1) (quotInt# ww1 2) (+# wild 1) s2
            }   
        };  
      6 -> (# w, (lvl, lvl1, Just (I# ww1)) #)
    }   
end Rec }

notice that ord . chr == id 注意那个ord . chr == id ord . chr == id , and so there is a redundant operation here: narrow8Word# (int2Word# (ord# (chr# wild2))) ord . chr == id ，所以这里有一个冗余操作： narrow8Word# (int2Word# (ord# (chr# wild2)))

Is there a reason GHC is needlessly converting from Int -> Char -> Int, or is this an example of poor code generation? 有没有理由GHC不必要地从Int - > Char - > Int转换，或者这是一个代码生成不良的例子？ Can this be optimized out? 这可以优化吗？

EDIT: This is using GHC 7.4.2, I have not tried compiling with any other version. 编辑：这是使用GHC 7.4.2，我没有尝试与任何其他版本编译。 I have since found the problem remains in GHC 7.6.2, but the redundant operations are removed in the current HEAD branch on github. 我已经发现问题仍然存在于GHC 7.6.2中，但是在github上的当前HEAD分支中删除了冗余操作。

Answer 1

Is there a reason GHC is needlessly converting from Int -> Char -> Int , or is this an example of poor code generation? 有没有理由GHC不必要地从Int -> Char -> Int ，或者这是一个代码生成不良的例子？ Can this be optimized out? 这可以优化吗？

Not really (to both). 不是（对两者而言）。 The core you get from -ddump-simpl is not the end. 你从-ddump-simpl获得的核心不是结束。 There are a few optimisations and transformations still done after that on the way to the assembly code. 在通往汇编代码的途中，仍然有一些优化和转换。 But removing the redundant conversions here isn't actually an optimisation. 但是在这里删除冗余转换实际上并不是一种优化。

They can be, and are, removed between the core and the assembly. 它们可以在核心和组件之间移除。 The point is that these primops - except for the narrowing - are no-ops, they are only present in the core because that's typed. 关键在于这些原始 - 除了缩小 - 都是无操作，它们只存在于核心，因为它是键入的。 Since they are no-ops, it doesn't matter whether there is a redundant chain of them in the core. 由于它们是无操作，因此核心中是否存在冗余链并不重要。

The assembly that 7.6.1 produces from the code [it's more readable than what 7.4.2 produces, so I take that] - with ord instead of to6BitASCII - is 7.6.1从代码生成的程序集[它比7.4.2产生的更可读，所以我认为] - 用ord而不是to6BitASCII - 是

ASCII.$wa_info:
_cXT:
    addq $64,%r12
    cmpq 144(%r13),%r12
    ja _cXX
    movq %rdi,%rcx
    cmpq $6,%rdi
    jne _cXZ
    movq $GHC.Types.I#_con_info,-56(%r12)
    movq %rsi,-48(%r12)
    movq $Data.Maybe.Just_con_info,-40(%r12)
    leaq -55(%r12),%rax
    movq %rax,-32(%r12)
    movq $(,,)_con_info,-24(%r12)
    movq $lvl1_rVq_closure+1,-16(%r12)
    movq $lvl_rVp_closure+1,-8(%r12)
    leaq -38(%r12),%rax
    movq %rax,0(%r12)
    leaq -23(%r12),%rbx
    jmp *0(%rbp)
_cXX:
    movq $64,192(%r13)
_cXV:
    movl $ASCII.$wa_closure,%ebx
    jmp *-8(%r13)
_cXZ:
    movl $2,%ebx
    movq %rsi,%rax
    cqto
    idivq %rbx
    movq %rax,%rsi
    cmpq $1114111,%rdx
    jbe _cY2
    movq %rdx,%r14
    addq $-64,%r12
    jmp GHC.Char.chr2_info
_cY2:
    movb %dl,(%r14)
    incq %r14
    leaq 1(%rcx),%rdi
    addq $-64,%r12
    jmp ASCII.$wa_info
    .size ASCII.$wa_info, .-ASCII.$wa_info

The part where the narrow8Word# (int2Word# (ord# (chr# wild2))) appears in the core is after the cmpq $1114111, %rdx . narrow8Word# (int2Word# (ord# (chr# wild2)))出现在核心中的部分位于cmpq $1114111, %rdx 。 If the quotient is not out-of-range, the code jumps to _cY2 which contains no such conversions anymore. 如果商不超出范围，则代码跳转到_cY2 ，不再包含此类转换。 A byte is written to the array, some pointers/counters are incremented, and that's it, jump back to the top. 一个字节写入数组，一些指针/计数器递增，就是这样，跳回到顶部。

I think it would be possible to generate better code from this than GHC currently does, but the redundant no-op conversions already vanish. 我认为有可能从GHC产生更好的代码，但冗余的无操作转换已经消失。

GHC生成冗余核心操作

问题描述

1 个解决方案

解决方案1
19 已采纳 2013-02-26 18:51:01

GHC生成冗余核心操作

问题描述

1 个解决方案

解决方案1 19 已采纳 2013-02-26 18:51:01

解决方案1
19 已采纳 2013-02-26 18:51:01