简体   繁体   中英

Cutting a string into a list in Haskell?

is it possible to cut a string eg

"one , Two"

to a list

["one", "two"]

or just

"one", "two"

thanks

There's a whole module of functions for different strategies to split a list (such as a string, which is just a list of characters): Data.List.Split

Using this, you could do

import Data.List.Split

> splitOn " , " "one , Two"
["one","Two"]

Regular old list operations are sufficient here,

import Data.Char

> [ w | w <- words "one , Two", all isAlpha w ]
["one","Two"]

aka

> filter (all isAlpha) . words $ "one , Two"
["one","Two"]

List hacking, parsing and design

There is a scale of power and weight in text processing. At the simplest, list-based solutions, such as the one above, offer very little syntactic noise, for quick results (in the same spirit as quick'n'dirty text processing in shell scripts).

List manipulation can get quite sophisticated, and you might consider, eg the generalized split library, for splitting lists on arbitrary text,

> splitOn " , " "one , Two"
["one","Two"]

For harder problems, or for code that is not likely to be thrown away, more robust techniques make sense. In particular, you can avoid fragile pattern matching by describing the problem as a grammar with parser combinators, such as parsec or uu-parsinglib . String-processing described via parsers tends to lead to more robust code over time, as it is relatively easy to modify parsers written in a combinator style, as requirements change.

Note on regular expressions: list matching and regular expressions are approximately equivalent in ease of use and (un)safety, so for the purposes of this discussion, you can substitute "regex" for "list splitting". Parsing is almost always the right approach, if the code is intended to be long lived.

If you'd rather not install the split package ( see Frerich Raabe's answer ), here's an implementation of the splitOn function that's light on dependencies:

import Data.List

splitOn :: Eq a => [a] -> [a] -> [[a]]
splitOn []    _  = error "splitOn: empty delimiter"
splitOn delim xs = loop xs
    where loop [] = [[]]
          loop xs | delim `isPrefixOf` xs = [] : splitOn delim (drop len xs)
          loop (x:xs) = let (y:ys) = splitOn delim xs
                         in (x:y) : ys
          len = length delim

Untested, using Parsec. Theres probably a regex separator too.

firstElement :: Parser String
firstElement = many $ noneOf ' '

otherElement :: Parser String
otherElement = do many $ char ' '
                  char ','
                  many $ char ' '
                  firstElement

elements :: Parser [String]
elements = liftM2 (:) firstElement (many otherElement)

parseElements :: String -> [String]
parseElements = parse elements "(unknown)"

It would be nice to clean up otherElement somehow, similar to how I managed to collapse elements using liftM2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM