简体   繁体   中英

processing text file with locale, ignore non-Ascii characters

How can I load and print the content of this file? http://daiw.de/share/misc/2014-05-28_haskell/foo.txt

nice text: lalala.
mean german text: Größe!

My current example code

main :: IO ()
main = do
    content <- readFile "foo.txt"
    putStrLn content

produces the following output:

nice text: lalala.
Main.hs: foo.txt: hGetContents: invalid argument (invalid byte sequence)

It would be totally OK if all non-Ascii characters would be replaced by a dummy character or dropped completely.

GHC supports the native locale. As long as your local setting is something sensible, it will "just work":

$ runhaskell foo.hs
nice text: lalala.
mean german text: Größe!

Set eg

LANG=en_US.UTF-8

Just wrote this and it works for me right now:

import Data.Char
import Control.Applicative
import qualified Data.ByteString.Char8 as B

readFileAscii :: String -> IO String
readFileAscii path = B.unpack <$> B.map (clearChar '-') <$> B.readFile path
    where
        clearChar :: Char -> Char -> Char
        clearChar d c
            | c == '\r' || c == '\n' = c
            | c >= '\32' && c < '\128' = c
            | otherwise = d

main :: IO ()
main = do
    content <- readFileAscii "foo.txt"
    putStrLn $ content
    putStrLn $ map toUpper content

I hope it is not an unclean solution and will haunt me later. If it is bad, please let me know. As you probably already noticed, I am a beginner regarding Haskell.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM