简体   繁体   English

文本或字节串

[英]Text or Bytestring

Good day.再会。

The one thing I now hate about Haskell is quantity of packages for working with string.我现在讨厌 Haskell 的一件事是用于处理字符串的包的数量。

First I used native Haskell [Char] strings, but when I tried to start using hackage libraries then completely lost in endless conversions.首先我使用原生 Haskell [Char]字符串,但是当我尝试开始使用 hackage 库时,我完全迷失在无休止的转换中。 Every package seem to use different strings implementation, some adopts their own handmade thing.每个包似乎都使用了不同的字符串实现,有的采用了自己手工制作的东西。

Next I rewrote my code with Data.Text strings and OverloadedStrings extension, I chose Text because it has a wider set of functions, but it seems many projects prefer ByteString .接下来我用Data.Text字符串和OverloadedStrings扩展重写了我的代码,我选择Text因为它有更广泛的功能集,但似乎很多项目更喜欢ByteString
Someone could give short reasoning why to use one or other?有人可以给出简短的理由为什么使用一个或另一个?

PS: btw how to convert from Text to ByteString ? PS:顺便说一句,如何从Text转换为ByteString

Couldn't match expected type Data.ByteString.Lazy.Internal.ByteString against inferred type Text Expected type: IO Data.ByteString.Lazy.Internal.ByteString Inferred type: IO Text无法将预期类型Data.ByteString.Lazy.Internal.ByteString与推断类型文本匹配 预期类型:IO Data.ByteString.Lazy.Internal.ByteString 推断类型:IO 文本

I tried encodeUtf8 from Data.Text.Encoding , but no luck:我试图encodeUtf8Data.Text.Encoding ,但没有运气:

Couldn't match expected type Data.ByteString.Lazy.Internal.ByteString against inferred type Data.ByteString.Internal.ByteString无法将预期类型Data.ByteString.Lazy.Internal.ByteString与推断类型Data.ByteString.Internal.ByteString 匹配

UPD:更新:

Thanks for responses, that *Chunks goodness looks like way to go, but I somewhat shocked with result, my original function looked like this:感谢您的回复,*Chunks goodness 看起来不错,但结果让我有些震惊,我原来的函数是这样的:

htmlToItems :: Text -> [Item]
htmlToItems =
    getItems . parseTags . convertFuzzy Discard "CP1251" "UTF8"

And now became:现在变成了:

htmlToItems :: Text -> [Item]
htmlToItems =
    getItems . parseTags . fromLazyBS . convertFuzzy Discard "CP1251" "UTF8" . toLazyBS
    where
      toLazyBS t = fromChunks [encodeUtf8 t]
      fromLazyBS t = decodeUtf8 $ intercalate "" $ toChunks t

And yes, this function is not working because its wrong, if we supply Text to it, then we're confident this text is properly encoded and ready to use and converting it is stupid thing to do, but such a verbose conversion still has to take place somewhere outside htmltoItems .是的,这个函数不能工作,因为它是错误的,如果我们向它提供Text ,那么我们相信这个文本被正确编码并且可以使用并且转换它是一件愚蠢的事情,但是这样冗长的转换仍然必须发生在htmltoItems之外的某个地方。

ByteStrings are mainly useful for binary data, but they are also an efficient way to process text if all you need is the ASCII character set. ByteStrings主要用于二进制数据,但如果您只需要 ASCII 字符集,它们也是处理文本的有效方法。 If you need to handle unicode strings, you need to use Text .如果需要处理 unicode 字符串,则需要使用Text However, I must emphasize that neither is a replacement for the other, and they are generally used for different things: while Text represents pure unicode, you still need to encode to and from a binary ByteString representation whenever you eg transport text via a socket or a file.但是,我必须强调,两者都不是另一个的替代品,它们通常用于不同的事情:虽然Text代表纯 unicode, ByteString您通过套接字或一份文件。

Here is a good article about the basics of unicode, which does a decent job of explaining the relation of unicode code-points ( Text ) and the encoded binary bytes ( ByteString ): The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets这是一篇关于 unicode 基础知识的好文章,它很好地解释了 unicode 代码点 ( Text ) 和编码二进制字节 ( ByteString ) 的关系:每个软件开发人员绝对、肯定必须了解的绝对最小值和字符集

You can use the Data.Text.Encoding module to convert between the two datatypes, or Data.Text.Lazy.Encoding if you are using the lazy variants (as you seem to be doing based on your error messages).您可以使用Data.Text.Encoding模块在两种数据类型之间进行转换,或者Data.Text.Lazy.Encoding如果您使用的是惰性变体(正如您根据错误消息所做的那样)。

You definitely want to be using Data.Text for textual data.您肯定希望将 Data.Text 用于文本数据。

encodeUtf8 is the way to go. encodeUtf8是要走的路。 This error:这个错误:

Couldn't match expected type Data.ByteString.Lazy.Internal.ByteString against inferred type Data.ByteString.Internal.ByteString无法将预期类型 Data.ByteString.Lazy.Internal.ByteString 与推断类型 Data.ByteString.Internal.ByteString 匹配

means that you're supplying a strict bytestring to code which expects a lazy bytestring.意味着您正在为需要惰性字节串的代码提供严格的字节串。 Conversion is easy with the fromChunks function:使用fromChunks函数可以轻松转换:

Data.ByteString.Lazy.fromChunks :: [Data.ByteString.Internal.ByteString] -> ByteString

so all you need to do is add the function fromChunks [myStrictByteString] wherever the lazy bytestring is expected.所以你需要做的就是在需要惰性字节fromChunks [myStrictByteString]地方添加函数fromChunks [myStrictByteString]

Conversion the other way can be accomplished with the dual function toChunks , which takes a lazy bytestring and gives a list of strict chunks.另一种方式的转换可以通过双重函数toChunks ,它接受一个惰性字节toChunks并给出一个严格块的列表。

You may want to ask the maintainers of some packages if they'd be able to provide a text interface instead of, or in addition to, a bytestring interface.您可能想询问某些软件包的维护者,他们是否能够提供文本界面来代替字节串界面,或者除了字节串界面之外。

Use a single function cs from the Data.String.Conversions .使用Data.String.Conversions的单个函数cs

It will allow you to convert between String , ByteString and Text (as well as ByteString.Lazy and Text.Lazy ), depending on the input and the expected types.它将允许您根据输入和预期类型在StringByteStringText (以及ByteString.LazyText.Lazy )之间进行转换。

You still have to call it, but no longer to worry about the respective types.您仍然需要调用它,但不再需要担心各自的类型。

See this answer for usage example.有关用法示例,请参阅此答案

For what it's worth, I found these two helper functions to be quite useful:就其价值而言,我发现这两个辅助函数非常有用:

import qualified Data.ByteString.Char8 as BS
import qualified Data.Text             as T

-- | Text to ByteString
tbs :: T.Text -> BS.ByteString
tbs = BS.pack . T.unpack

-- | ByteString to Text
bst :: BS.ByteString -> T.Text
bst = T.pack . BS.unpack

Example:例子:

foo :: [BS.ByteString]
foo = ["hello", "world"]

bar :: [T.Text]
bar = bst <$> foo

baz :: [BS.ByteString]
baz = tbs <$> bar

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM