简体   繁体   English

在ByteString上拆分ByteString(而不是Word8或Char)

[英]Split ByteString on a ByteString (instead of a Word8 or Char)

I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as: 我知道我已经有了Haskell Data.ByteString.Lazy函数,可以在单个字符上分割CSV,例如:

split :: Word8 -> ByteString -> [ByteString]

But I want to split on a multi-character ByteString (like splitting on a String instead of a Char): 但是我想在一个多字符的ByteString上拆分(就像在一个String而不是一个Char上拆分):

split :: ByteString -> ByteString -> [ByteString]

I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character and discarding the others would contaminate the data import. 我需要解析类似csv的文本文件中的多个字符分隔符,并且各个字符本身会出现在某些字段中,因此仅选择一个分隔符并丢弃其他分隔符会污染数据导入。

I've had some ideas on how to do this, but they seem kind of hacky (eg take three Word8s, test if they're the separator combination, start a new field if they are, recurse further), and I imagine I would be reinventing a wheel anyway. 我对如何执行此操作有一些想法,但是它们似乎有些怪异(例如,使用三个Word8,测试它们是否是分隔符组合,如果是,请启动一个新字段,再递归),我想我会无论如何都要重新发明轮子。 Is there a way to do this without rebuilding the function from scratch? 有没有办法从头开始重建功能的方法?

There are a few functions in bytestring for splitting on subsequences: 字节串中有一些函数可用于拆分子序列:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

There's also a 还有一个

The documentation of Bytestrings breakSubstring contains a function that does what you are asking for: Bytestrings breakSubstring的文档包含一个功能,该功能可满足您的要求:

tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM