[英]Split ByteString on a ByteString (instead of a Word8 or Char)
I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as: 我知道我已经有了Haskell Data.ByteString.Lazy函数,可以在单个字符上分割CSV,例如:
split :: Word8 -> ByteString -> [ByteString]
But I want to split on a multi-character ByteString (like splitting on a String instead of a Char): 但是我想在一个多字符的ByteString上拆分(就像在一个String而不是一个Char上拆分):
split :: ByteString -> ByteString -> [ByteString]
I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character and discarding the others would contaminate the data import. 我需要解析类似csv的文本文件中的多个字符分隔符,并且各个字符本身会出现在某些字段中,因此仅选择一个分隔符并丢弃其他分隔符会污染数据导入。
I've had some ideas on how to do this, but they seem kind of hacky (eg take three Word8s, test if they're the separator combination, start a new field if they are, recurse further), and I imagine I would be reinventing a wheel anyway. 我对如何执行此操作有一些想法,但是它们似乎有些怪异(例如,使用三个Word8,测试它们是否是分隔符组合,如果是,请启动一个新字段,再递归),我想我会无论如何都要重新发明轮子。 Is there a way to do this without rebuilding the function from scratch? 有没有办法从头开始重建功能的方法?
There are a few functions in bytestring for splitting on subsequences: 字节串中有一些函数可用于拆分子序列:
breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)
There's also a 还有一个
The documentation of Bytestrings breakSubstring
contains a function that does what you are asking for: Bytestrings breakSubstring
的文档包含一个功能,该功能可满足您的要求:
tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
where (h,t) = breakSubstring x y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.