简体   繁体   English

Aeson不会解码具有Unicode字符的字符串

[英]Aeson does not decode strings with unicode characters

I'm trying to use Data.Aeson ( https://hackage.haskell.org/package/aeson-0.6.1.0/docs/Data-Aeson.html ) to decode some JSON strings, however it is failing to parse strings that contain non-standard characters. 我正在尝试使用Data.Aeson( https://hackage.haskell.org/package/aeson-0.6.1.0/docs/Data-Aeson.html )解码一些JSON字符串,但是它无法解析那些包含非标准字符。

As an example, the file: 例如,文件:

import Data.Aeson
import Data.ByteString.Lazy.Char8 (pack)

test1 :: Maybe Value
test1 = decode $ pack "{ \"foo\": \"bar\"}"

test2 :: Maybe Value
test2 = decode $ pack "{ \"foo\": \"bòz\"}"

When run in ghci, gives the following results: 在ghci中运行时,得到以下结果:

*Main> :l ~/test.hs
[1 of 1] Compiling Main             ( /Users/ltomlin/test.hs, interpreted )
Ok, modules loaded: Main.
*Main> test1
Just (Object fromList [("foo",String "bar")])
*Main> test2
Nothing

Is there a reason that it doesn't parse the String with the unicode character? 有没有理由不解析具有Unicode字符的String? I was under the impression that Haskell was pretty good with unicode. 我的印象是Haskell的unicode相当不错。 Any suggestions would be greatly appreciated! 任何建议将不胜感激!

Thanks, 谢谢,

tetigi 特蒂吉

EDIT 编辑

Upon further investigation using eitherDecode , I get the following error message: 在使用eitherDecode进一步调查eitherDecode ,我收到以下错误消息:

 *Main> test2
 Left "Failed reading: Cannot decode byte '\\x61': Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream"

x61 is the unicode character for 'z', which comes right after the special unicode character. x61是'z'的Unicode字符, x61在特殊Unicode字符之后。 Not sure why it's failing to read the characters after the special character! 不确定为什么在特殊字符之后无法读取字符!

Changing test2 to be test2 = decode $ pack "{ \\"foo\\": \\"bòz\\"}" instead gives the error: test2更改为test2 = decode $ pack "{ \\"foo\\": \\"bòz\\"}"会产生错误:

Left "Failed reading: Cannot decode byte '\\xf2': Data.Text.Encoding.decodeUtf8: Invalid UTF-8 stream"

Which is the character for "ò", which makes a bit more sense. 这是“ò”的字符,这更有意义。

The problem is your usage of pack from the Char8 module, which doesn't work with non-Latin 1 data. 问题是您使用了Char8模块中的pack,不适用于非Latin 1数据。 Instead, use encodeUtf8 from text. 而是使用文本中的encodeUtf8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM