简体   繁体   中英

Split ByteString on a ByteString (instead of a Word8 or Char)

I know I already have the Haskell Data.ByteString.Lazy function to split a CSV on a single character, such as:

split :: Word8 -> ByteString -> [ByteString]

But I want to split on a multi-character ByteString (like splitting on a String instead of a Char):

split :: ByteString -> ByteString -> [ByteString]

I have multi-character separators in a csv-like text file that I need to parse, and the individual characters themselves appear in some of the fields, so choosing just one separator character and discarding the others would contaminate the data import.

I've had some ideas on how to do this, but they seem kind of hacky (eg take three Word8s, test if they're the separator combination, start a new field if they are, recurse further), and I imagine I would be reinventing a wheel anyway. Is there a way to do this without rebuilding the function from scratch?

There are a few functions in bytestring for splitting on subsequences:

breakSubstring :: ByteString -> ByteString -> (ByteString,ByteString)

There's also a

The documentation of Bytestrings breakSubstring contains a function that does what you are asking for:

tokenise x y = h : if null t then [] else tokenise x (drop (length x) t)
    where (h,t) = breakSubstring x y

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM