简体   繁体   English

读取Haskell中具有“ US-ASCII”编码的文件:hGetContents:无效的参数(无效的字节序列)

[英]Reading file with “US-ASCII” encoding in Haskell: hGetContents: invalid argument (invalid byte sequence)

I'm using Haskell for programming a parser, but this error is a wall I can't pass. 我正在使用Haskell对解析器进行编程,但是此错误是我无法通过的墙。 Here is my code: 这是我的代码:

main = do
  arguments    <- getArgs
  let fileName = head arguments
  fileContents <- readFile fileName
  converter    <- open "UTF-8" Nothing
  let titleLength           = length fileName
      titleWithoutExtension = take (titleLength - 4) fileName
      allNonEmptyLines      = unlines $ tail $ filter (/= "") $ lines fileContents

When I try to read a file with "US-ASCII" encoding I get the famous error hGetContents: invalid argument (invalid byte sequence). 当我尝试使用“ US-ASCII”编码读取文件时,出现著名的错误hGetContents:无效参数(无效字节序列)。 I've tried to change the "UTF-8" in my code by "US-ASCII", but the error persist. 我试图通过“ US-ASCII”更改代码中的“ UTF-8”,但错误仍然存​​在。 Is there a way for reading this files, or any kind of file handling encoding problems? 有没有办法读取此文件,或者有任何类型的文件处理编码问题?

You should hSetEncoding to configure the file handle for a specific text encoding, eg: 您应该hSetEncoding为特定的文本编码配置文件句柄,例如:

import System.Environment
import System.IO

main = do
  (path : _) <- getArgs
  h <- openFile path ReadMode
  hSetEncoding h latin1
  contents <- hGetContents h
  -- no need to close h
  putStrLn $ show $ length contents

If your file contains non-ASCII characters and it's not UTF8 encoded, then latin1 is a good bet although it's not the only possibility. 如果您的文件包含非ASCII字符且未使用UTF8编码,则latin1是一个不错的选择,尽管这不是唯一的可能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM