切换到ByteStrings

Question

EDIT: I followed Yuras and Dave4420's advices (Thanks). 编辑：我遵循了Yuras和Dave4420的建议（谢谢）。 I still have some errors. 我仍然有一些错误。 Updated the question. 更新了问题。 Finally I will use meiersi's version (Thanks) but I still want to find my errors... 最后，我将使用meiersi的版本（谢谢），但我仍然想找到我的错误...

I have a simple script that goes like this: 我有一个简单的脚本，如下所示：

import System.Environment

getRow :: Int -> String -> String
getRow n = (!!n) . lines

getField :: Int -> String -> String
getField n = (!!n) . words'

words' :: String -> [String]
words' str = case str of
                        [] -> []
                        _ -> (takeHead " ; " str) : (words' (takeTail " ; " str))

takeHead :: String -> String -> String
takeHead st1 st2 = case st2 of
                                [] -> []
                                _ -> if st1 == (nHead (length st1) st2) then [] else (head st2):(takeHead st1 (tail st2))

takeTail :: String -> String -> String
takeTail st1 st2 = case st2 of
                                [] -> []
                                _ -> if st1 == (nHead (length st1) st2) then nTail (length st1) st2 else takeTail st1 (tail st2)

nTail :: Int -> String -> String
nTail n str = let rec n str = if n == 0 then str else rec (n - 1) (tail str)
              in if (length str) < n then str else rec n str

nHead :: Int -> String -> String
nHead n str = let rec n str = if n == 0 then [] else (head str):(rec (n - 1) (tail str))
              in if (length str) < n then str else rec n str

getValue :: String -> String -> String -> String
getValue row field src = getField (read field) $ getRow (read row) src

main :: IO ()
main = do
    args <- getArgs
    case args of
        (path: opt1: opt2: _) -> do
            src <- readFile path
            putStrLn $ getValue opt1 opt2 src
        (path: _) -> do
            src <- readFile path
            putStrLn $ show $ length $ lines src

It compiles and works. 它可以编译和工作。 Then I wanted to switch to ByteString s. 然后我想切换到ByteString 。 Here is my attempt: 这是我的尝试：

import qualified Data.ByteString.Lazy as B
import qualified Data.ByteString.Lazy.Char8 as Bc (cons, empty,unpack)
import qualified Data.ByteString.Lazy.UTF8 as Bu (lines)
import qualified System.Posix.Env.ByteString as Bg (getArgs)

separator :: B.ByteString
separator = (Bc.cons ' ' (Bc.cons ';' (Bc.cons ' ' Bc.empty)))

getRow :: Int -> B.ByteString -> B.ByteString
getRow n = (`B.index` n) $ Bu.lines

getCol :: Int -> B.ByteString -> B.ByteString
getCol n = (`B.index` n) $ wordsWithSeparator

wordsWithSeparator :: B.ByteString -> [B.ByteString]
wordsWithSeparator str = if B.null str then [] else (takeHead separator str):(wordsWithSeparator (takeTail separator str))

takeHead :: B.ByteString -> B.ByteString -> B.ByteString
takeHead st1 st2 = if B.null st2 then B.empty else if st1 == (nHead (toInteger (B.length st1)) st2) then B.empty else B.cons (B.head st2) (takeHead st1 (B.tail st2))

takeTail :: B.ByteString -> B.ByteString -> B.ByteString
takeTail st1 st2 = if B.null st2 then B.empty else if st1 == (nHead (toInteger (B.length st1)) st2) then nTail (toInteger (B.length st1)) st2 else takeTail st1 (B.tail st2)

nTail :: Integer -> B.ByteString -> B.ByteString
nTail n str = let rec n str = if n == 0 then str else rec (n - 1) (B.tail str)
              in if (toInteger (B.length str)) < n then str else rec n str

nHead :: Integer -> B.ByteString -> B.ByteString
nHead n str = let rec n str = if n == 0 then B.empty else B.cons (B.head str)(rec (n - 1) (B.tail str))
              in if (toInteger (B.length str)) < n then str else rec n str

getValue :: B.ByteString -> B.ByteString -> B.ByteString -> B.ByteString
getValue row field = getCol (read (Bc.unpack field)) . getRow (read (Bc.unpack row))

main = do args <- Bg.getArgs
          case (map (B.fromChunks . return) args) of
                                                    (path:opt1:opt2:_) -> do src <- B.readFile (Bc.unpack path)
                                                                             B.putStrLn $ getValue opt1 opt2 src

                                                    (path:_)           -> do src <- B.readFile (Bc.unpack path)
                                                                             putStrLn $ show $ length $ Bu.lines src

It doesn't work. 没用 I could not debug it. 我无法调试。 Here is what GHC tells me: 这是GHC告诉我的：

BETA_getlow2.hs:10:23:
    Couldn't match expected type `GHC.Int.Int64' with actual type `Int'
    In the second argument of `B.index', namely `n'
    In the expression: (`B.index` n)
    In the expression: (`B.index` n) $ Bu.lines

BETA_getlow2.hs:13:23:
    Couldn't match expected type `GHC.Int.Int64' with actual type `Int'
    In the second argument of `B.index', namely `n'
    In the expression: (`B.index` n)
    In the expression: (`B.index` n) $ wordsWithSeparator

Any tips would be appreciated. 任何提示将不胜感激。

Answer 1

getRow n = (!!n) . lines

Compare with 与之比较

getRow n = B.index . Bu.lines

In the second version you don't use n at all, so it is the same as 在第二个版本中，您根本不使用n ，因此它与

getRow _ = B.index . Bu.lines

In the fist example you use n as an argument to the (!!) operator. 在第一个示例中，您使用n作为(!!)运算符的参数。 You need to do the same in the second version. 您需要在第二个版本中执行相同的操作。

Looks like it is not the only issue in your code, but I hope it is a good point to start ;) 看起来这不是代码中的唯一问题，但我希望这是一个好起点；）

Answer 2

I'm taking the liberty to interpret the following two sub-questions into your original question. 我谨将以下两个子问题解释为您的原始问题。

What Haskell code would one typically write for a script like the one you posted. 一个Haskell代码通常会像您发布的脚本那样为脚本编写。
What are the right data structures to efficiently perform the desired functionality. 什么是有效执行所需功能的正确数据结构。

The following code gives one answer to these two sub-questions. 以下代码为这两个子问题提供了一个答案。 It uses the text library to represent sequences of Unicode characters. 它使用text库来表示Unicode字符序列。 Moreover, it exploits the text library's high-level API to implement the desired functionality. 此外，它利用text库的高级API来实现所需的功能。 This makes the code easier to grasp and thereby avoids potential mistakes in the implementation of low-level functions. 这使代码更易于掌握，从而避免了在执行低级功能时可能发生的错误。

{-# LANGUAGE OverloadedStrings #-}

import qualified Data.Text    as T
import qualified Data.Text.IO as T

import System.Environment (getArgs)

type Table a = [[a]]

-- | Split a text value into a text table.
toTable :: T.Text -> Table T.Text
toTable = map (T.splitOn " ; ") . T.lines

-- | Retrieve a cell from a table.
cell :: Int -> Int -> Table a -> a
cell row col = (!! col) . (!! row)

main :: IO ()
main = do
    (path:rest) <- getArgs
    src <- T.readFile path
    case rest of
        row : col : _ -> T.putStrLn $ cell (read row) (read col) $ toTable src
        _             -> putStrLn $ show $ length $ T.lines src

Answer 3

The first two errors Yuras has resolved for you, I think. 我认为Yuras为您解决了前两个错误。

Re the 3rd error: 重新出现第三个错误：

words' :: B.ByteString -> [B.ByteString]
words' str = if B.null str then B.empty else ...

The B.empty should be [] . B.empty应该为[] 。 B.empty :: B.ByteString , but the result is supposed to have type [B.ByteString] . B.empty :: B.ByteString ，但结果应为[B.ByteString]类型。

Re the 4th-7th errors: 重新出现第4-7个错误：

length :: [a] -> Int
B.length :: B.ByteString -> Int64

In this case I would change the type signatures of nTail and nHead to use Int64 instead of Int . 在这种情况下，我将nTail和nHead的类型签名更改为使用Int64而不是Int 。 If that didn't work, I'd use Integer on all Integral types, using toInteger to do the conversion. 如果那不起作用，我将在所有Integral类型上使用Integer ，并使用toInteger进行转换。

Re the 8th error: 重新出现第8个错误：

The input to read must be a String . 要read的输入必须是String 。 There's no getting round that. 没有回合。 You'll have to convert the B.ByteString to a String and pass that to read . 您必须将B.ByteString转换为String并将其传递为read 。

(Incidently, are you sure you want to switch to ByteString and not Text?) （顺便说一句，您确定要切换到ByteString而不是Text吗？）

Re the 9th (final) error: 重新出现第9个（最终）错误：

args :: [Data.ByteString.ByteString] (nb a list of strict bytestrings, not the lazy bytestrings you use elsewhere) but in the pattern match you expect args :: B.ByteString for some reason. args :: [Data.ByteString.ByteString] （nb个严格字节args :: [Data.ByteString.ByteString]的列表，而不是您在其他地方使用的惰性字节args :: B.ByteString ），但是在模式匹配中，出于某种原因，您期望args :: B.ByteString 。

You should pattern match on a [ByteString] the same way you pattern match on a [String] : they are both lists. 您应该在[ByteString]上进行模式匹配，就像在[String]上进行模式匹配一样：它们都是列表。

Convert args to something of type [B.ByteString] with map (B.fromChunks . return) args . 使用map (B.fromChunks . return) args将args转换为[B.ByteString]类型的东西。

切换到ByteStrings

问题描述

3 个解决方案

解决方案1
7 2012-04-24 23:18:32

解决方案2
2 已采纳 2012-04-25 10:55:03

解决方案3
0 2012-04-25 08:24:48

切换到ByteStrings

问题描述

3 个解决方案

解决方案1 7 2012-04-24 23:18:32

解决方案2 2 已采纳 2012-04-25 10:55:03

解决方案3 0 2012-04-25 08:24:48

解决方案1
7 2012-04-24 23:18:32

解决方案2
2 已采纳 2012-04-25 10:55:03

解决方案3
0 2012-04-25 08:24:48