寻找字节串终止符的函数的Haskell优化

Question

Profiling of some code showed that about 65% of the time I was inside the following code. 对某些代码进行性能分析表明，大约有65％的时间在以下代码中。

What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. 它的作用是使用Data.Binary.Get monad遍历字节串以查找终止符。 If it detects 0xff, it checks if the next byte is 0x00. 如果检测到0xff，则检查下一个字节是否为0x00。 If it is, it drops the 0x00 and continues. 如果是，它将丢弃0x00并继续。 If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned. 如果它不是0x00，则它将丢弃两个字节，并且结果字节列表将转换为字节串并返回。

Any obvious ways to optimize this? 有什么明显的方法可以优化吗？ I can't see it. 我看不到

parseECS = f [] False
    where
    f acc ff = do
        b <- getWord8
        if ff
            then if b == 0x00
                then f (0xff:acc) False
                else return $ L.pack (reverse acc)
            else if b == 0xff
                then f acc True
                else f (b:acc) False

Answer 1

Bug fix 错误修复

It seems there may be a bug here. 似乎这里可能有错误。 An exception gets raised if you reach the end of the byte stream before an 0xff, not 0x00 sequence is found. 如果在找到0xff而不是0x00序列之前到达字节流的末尾，则会引发异常。 Here's a modified version of your function: 这是功能的修改版本：

parseECS :: Get L.ByteString
parseECS = f [] False
  where
    f acc ff = do
      noMore <- isEmpty
      if noMore
         then return $ L.pack (reverse acc)
         else do
           b <- getWord8
           if ff
              then
                if b == 0x00
                   then f (0xff:acc) False
                   else return $ L.pack (reverse acc)
              else
                if b == 0xff
                   then f acc True
                   else f (b:acc) False

Optimization 优化

I haven't done any profiling, but this function will probably be faster. 我没有进行任何分析，但是此功能可能会更快。 Reversing long lists is expensive. 反转长列表非常昂贵。 I'm not sure how lazy getRemainingLazyByteString is. 我不确定getRemainingLazyByteString有多懒。 If it's too strict this probably won't work for you. 如果太严格，这可能对您不起作用。

parseECS2 :: Get L.ByteString
parseECS2 = do
    wx <- liftM L.unpack $ getRemainingLazyByteString
    return . L.pack . go $ wx
  where
    go []             = []
    go (0xff:0x00:wx) = 0xff : go wx
    go (0xff:_)      = []
    go (w:wx)         = w : go wx

Answer 2

If problem is in "reverse" you can use "lookAhead" to scan position and then go back and rebuild your new string 如果问题是“反向”，则可以使用“ lookAhead”扫描位置，然后返回并重新构建新字符串

parseECS2 :: Get L.ByteString
parseECS2 = do
    let nextWord8 = do
            noMore <- isEmpty
            if noMore then return Nothing
                      else liftM Just getWord8

    let scanChunk !n = do
            b <- nextWord8
            case b of
                Just 0xff -> return (Right (n+1))
                Just _ -> scanChunk (n+1)
                Nothing -> return (Left n)

    let readChunks = do
            c <- lookAhead (scanChunk 0)
            case c of
                Left n -> getLazyByteString n >>= \blk -> return [blk]
                Right n -> do
                    blk <- getLazyByteString n
                    b <- lookAhead nextWord8
                    case b of
                        Just 0x00 -> skip 1 >> liftM (blk:) readChunks
                        _ -> return [L.init blk]

    liftM (foldr L.append L.empty) readChunks

寻找字节串终止符的函数的Haskell优化

问题描述

2 个解决方案

解决方案1
1 2010-03-20 20:02:09

Bug fix 错误修复

Optimization 优化

解决方案2
0 2010-03-21 16:03:51

寻找字节串终止符的函数的Haskell优化

问题描述

2 个解决方案

解决方案1 1 2010-03-20 20:02:09

Bug fix 错误修复

Optimization 优化

解决方案2 0 2010-03-21 16:03:51

解决方案1
1 2010-03-20 20:02:09

解决方案2
0 2010-03-21 16:03:51