简体   繁体   中英

F# map a seq to another seq of shorter length

I have a sequence of strings like this (lines in a file)

[20150101] error a
details 1
details 2
[20150101] error b
details
[20150101] error c

I am trying to map this to a sequence of strings like this (log entries)

[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c

I can do this in an imperative way (by translating the code I would write in C#) - this works but it reads like pseudo-code because I have omitted the referenced functions:

let getLogEntries logFilePath =  
    seq {
        let logEntryLines = new ResizeArray<string>()

        for lineOfText in getLinesOfText logFilePath do                        
            if isStartOfNewLogEntry lineOfText && logEntryLines.Any() then
                yield joinLines logEntryLines
                logEntryLines.Clear()  
            logEntryLines.Add(lineOfText)  

        if logEntryLines.Any() then
            yield joinLines logEntryLines             
    }  

Is there a more functional way of doing this?

I can't use Seq.map since it's not a one to one mapping, and Seq.fold doesn't seem right because I suspect it will process the entire input sequence before returning the results (not great if I have very large log files). I assume my code above isn't the ideal way to do this in F# because it's using ResizeArray<string> .

In general, when there is no built-in function that you can use, the functional way to solve things is to use recursion. Here, you can recursively walk over the input, remember the items of the last chunk (since the last [xyz] Info line) and produce new results when you reach a new starting block. In F#, you can write this nicely with sequence expressions:

let rec joinDetails (lines:string list) lastChunk = seq {
  match lines with
  | [] -> 
      // We are at the end - if there are any records left, produce a new item!
      if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
  | line::lines when line.StartsWith("[") ->
      // New block starting. Produce a new item and then start a new chunk
      if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
      yield! joinDetails lines [line]
  | line::lines ->
      // Ordinary line - just add it to the last chunk that we're collection
      yield! joinDetails lines (line::lastChunk) }

Here is an example showing the code in action:

let lines = 
  [ "[20150101] error a"
    "details 1"
    "details 2"
    "[20150101] error b"
    "details"
    "[20150101] error c" ]

joinDetails lines []

There is not much in-built in Seq that is going to help you, so you have to roll your own solution. Ultimately, parsing a file like this involves iterating and maintaining state, but what F# does is encapsulate that iteration and state by means of computation expressions (hence your use of the seq computation expression).

What you've done isn't bad but you could extract your code into a generic function that computes the chunks (ie sequences of strings) in an input sequence without knowledge of the format. The rest, ie parsing an actual log file, can be made purely functional.

I have written this function in the past to help with this.

let chunkBy chunkIdentifier source = 
    seq { 
        let chunk = ref []
        for sourceItem in source do
            let isNewChunk = chunkIdentifier sourceItem
            if isNewChunk && !chunk <> [] then 
                yield !chunk
                chunk := [ sourceItem ]
            else chunk := !chunk @ [ sourceItem ] 

        yield !chunk
    }

It takes a chunkIdentifier function which returns true if the input is the start of a new chunk.

Parsing a log file is simply a case of extracting the lines, computing the chunks and joining each chunk:

logEntryLines |> chunkBy (fun line -> line.[0] = '[')
    |> Seq.map (fun s -> String.Join (" ", s))

By encapsulating the iteration and mutation as much as possible, while creating a reusable function, it's more in the spirit of functional programming.

Alternatively, another two variants:

let lst = ["[20150101] error a";
           "details 1";
           "details 2";
           "[20150101] error b";
           "details";
           "[20150101] error c";]

let fun1 (xs:string list) = 
    let sb = new System.Text.StringBuilder(xs.Head) 
    xs.Tail

    |> Seq.iter(fun x -> match x.[0] with
                         | '[' -> sb.Append("\n" + x) 
                         | _   -> sb.Append(" "  + x) 
                         |> ignore)
    sb.ToString()

lst  |> fun1 |> printfn "%s"

printfn "";

let fun2 (xs:string list) =  
    List.fold(fun acc (x:string) -> acc + 
                                    match x.[0] with| '[' -> "\n"  | _   -> " " 
                                    + x) xs.Head xs.Tail 

lst |> fun2 |> printfn "%s"

Print:

[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c

[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c

Link: https://dotnetfiddle.net/3KcIwv

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM