简体   繁体   中英

How to convert a string into a list words in F#

I need to convert a string into a list of words no built-in functions, here what I tried to do so far, obviously something wrong:

let rec convert word = 
    match word with
    |"."      ->[]
    |word -> ["word"]
    |word + " " + words -> [word]@convert words

As your question has somewhat academic flavor I'd approach the solution respectively.

Putting aside for a moment the req of not using built-in libraries, the solution might follow the classic folding pattern that is easy to implement from scratch assuming certain split properties to be implemented later:

let string2Words s =
    let rec fold acc s =
        match split s with
        | [x] -> acc @ [x] // append the last word; done
        | [head;tail] -> fold (acc @ [head]) tail // append the head word; continue splitting
        | _ -> acc // done
    fold [] s

So, our task now is dwindled to implementing such split that accepts a string returning either a list with single word, or list of two elements with a head word and the rest of the string, or whatever signaling that nothing else left for further splitting and the time is to deliver the result.

Using the duality of string being a char[] we may now implement split relying on string indexers and slices instead of F# libraries arsenal:

let split s =
    let rec scan (s:string) i =
        if s.Length = 0 then []
        elif s.[i] = ' ' && i = 0 then scan s.[i+1..] 0
        elif s.[i] = ' ' then [s.[..i-1]; s.[i+1..]]
        elif i = (s.Length - 1) then [s]
        else scan s (i+1)
    scan s 0

Inner recursive scan function does the job expected by our fold (ab)using string slices and indexers and accounting for the corner cases on the way.

Putting all together now

let string2Words s =
    let split s =
        let rec scan (s:string) i =
            if s.Length = 0 then []
            elif s.[i] = ' ' && i = 0 then scan s.[i+1..] 0
            elif s.[i] = ' ' then [s.[..i-1]; s.[i+1..]]
            elif i = (s.Length - 1) then [s]
            else scan s (i+1)
        scan s 0
    let rec fold acc s =
        match split s with
        | [x] -> acc @ [x]
        | [head;tail] -> fold (acc @ [head]) tail
        | _ -> acc
    fold [] s

and quick-checking in fsi:

> string2Words "life without libraries is tough";;
val it : string list = ["life"; "without"; "libraries"; "is"; "tough"]

Try this one:

let rec words word text =
  [ match text with
    | [] -> yield word
    | c :: tail ->
        match c with
        | ' ' -> yield word
                 yield! words "" tail
        | _ -> yield! words (sprintf "%s%c" word c) tail ]


printfn "%A" ("hello my friend"
              |> Seq.toList
              |> words "")

["hello"; "my"; "friend"]

it is very simple but not very efficient though...

Here's a way to do it using a recursive function that pattern matches on the string as a list of characters:

let charsToString chars = chars |> Array.ofSeq |> System.String
let split (s: string) =
  let rec loop acc words rest =
    match rest with
    | ' '::xs ->
      if Seq.isEmpty acc then
        loop Seq.empty words xs
      else
        let newWord = charsToString acc
        loop Seq.empty (Seq.append words [newWord]) xs
    | x::xs -> loop (Seq.append acc [x]) words xs
    | [] -> // terminal case, we've reached end of string
      if Seq.isEmpty acc then
        words
      else
        let newWord = charsToString acc
        Seq.append words [newWord]
  loop Seq.empty Seq.empty (List.ofSeq s)

> split "Hello my friend"
val it : seq<System.String> = seq ["Hello"; "my"; "friend"]

The key to using pure recursion in this case is that you need to keep track of state:

  1. acc : characters you're accumulating to build the next word whenever you reach a space or the end of the entire string
  2. words you've already split out of the string; this will change each time a space is encountered (and there are characters in our acc buffer)
  3. rest : the characters we haven't yet examined in the string; this shrinks by one character for each recursion

This is what the inner loop function is taking as its arguments: a sequence of characters acc to build up words, a sequence of words that've already been split out, and the rest of the string we haven't processed yet. Notice the first call to loop passes empty sequences for these two states, and the entire string (as a list of char s as rest ).

The inner loop function is only necessary to hide the implementation details of the two state values for callers' convenience.

This implementation is not particularly efficient or elegant, it's meant to show basic recursion and pattern matching concepts.

In order to understand recursion, you must first understand recursion. But then you are allowed to leave the bare metal and do some linear plumbing instead. Chaining predefined library functions, each doing one transformation in a series towards the desired result.

"Hello, World ! "
|> fun s ->
    (s.ToCharArray(), ([], []))
    ||> Array.foldBack (fun c (cs, css) ->
        if c = ' ' then [], cs::css else c::cs, css )
|> List.Cons
|> List.filter (not << List.isEmpty)
|> List.map (fun s -> System.String(Array.ofList s))
// val it : System.String list = ["Hello,"; "World"; "!"]

We convert the string into a character array char[] and apply a folder to each element of the array, having as accumulator a tuple of char list , the characters of the current word, and char list list , the words so far. This is done in the reverse order, back-to-front, to construct the lists of the tuple in the right order. The result of this step is consed into a single char list list , filtered for empty lists, and recombined into strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM