简体   繁体   English

F#将一个序列映射到另一个较短长度的序列

[英]F# map a seq to another seq of shorter length

I have a sequence of strings like this (lines in a file) 我有这样的字符串序列(文件中的行)

[20150101] error a
details 1
details 2
[20150101] error b
details
[20150101] error c

I am trying to map this to a sequence of strings like this (log entries) 我正在尝试将其映射到这样的字符串序列(日志条目)

[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c

I can do this in an imperative way (by translating the code I would write in C#) - this works but it reads like pseudo-code because I have omitted the referenced functions: 我可以通过命令式方式(通过翻译我将用C#编写的代码)来做到这一点-可以工作,但它的读取方式类似于伪代码,因为我省略了引用的函数:

let getLogEntries logFilePath =  
    seq {
        let logEntryLines = new ResizeArray<string>()

        for lineOfText in getLinesOfText logFilePath do                        
            if isStartOfNewLogEntry lineOfText && logEntryLines.Any() then
                yield joinLines logEntryLines
                logEntryLines.Clear()  
            logEntryLines.Add(lineOfText)  

        if logEntryLines.Any() then
            yield joinLines logEntryLines             
    }  

Is there a more functional way of doing this? 有更实用的方法吗?

I can't use Seq.map since it's not a one to one mapping, and Seq.fold doesn't seem right because I suspect it will process the entire input sequence before returning the results (not great if I have very large log files). 我不能使用Seq.map因为它不是一对一的映射,而且Seq.fold似乎不正确,因为我怀疑它会在返回结果之前处理整个输入序列(如果我有非常大的日志文件,效果Seq.fold )。 I assume my code above isn't the ideal way to do this in F# because it's using ResizeArray<string> . 我认为上面的代码不是在F#中执行此操作的理想方法,因为它使用的是ResizeArray<string>

In general, when there is no built-in function that you can use, the functional way to solve things is to use recursion. 通常,当没有可用的内置函数时,解决问题的功能方法是使用递归。 Here, you can recursively walk over the input, remember the items of the last chunk (since the last [xyz] Info line) and produce new results when you reach a new starting block. 在这里,您可以递归地遍历输入,记住最后一个块的内容(因为最后一个[xyz] Info行),并在到达新的起始块时产生新的结果。 In F#, you can write this nicely with sequence expressions: 在F#中,您可以使用序列表达式很好地编写此代码:

let rec joinDetails (lines:string list) lastChunk = seq {
  match lines with
  | [] -> 
      // We are at the end - if there are any records left, produce a new item!
      if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
  | line::lines when line.StartsWith("[") ->
      // New block starting. Produce a new item and then start a new chunk
      if lastChunk <> [] then yield String.concat " " (List.rev lastChunk)
      yield! joinDetails lines [line]
  | line::lines ->
      // Ordinary line - just add it to the last chunk that we're collection
      yield! joinDetails lines (line::lastChunk) }

Here is an example showing the code in action: 这是一个显示正在运行的代码的示例:

let lines = 
  [ "[20150101] error a"
    "details 1"
    "details 2"
    "[20150101] error b"
    "details"
    "[20150101] error c" ]

joinDetails lines []

There is not much in-built in Seq that is going to help you, so you have to roll your own solution. Seq内置的功能不足以帮助您,因此您必须推出自己的解决方案。 Ultimately, parsing a file like this involves iterating and maintaining state, but what F# does is encapsulate that iteration and state by means of computation expressions (hence your use of the seq computation expression). 最终,像这样解析文件涉及迭代和维护状态,但是F#所做的是通过计算表达式封装了该迭代和状态(因此您将使用seq计算表达式)。

What you've done isn't bad but you could extract your code into a generic function that computes the chunks (ie sequences of strings) in an input sequence without knowledge of the format. 您所做的事情还不错,但是您可以将代码提取到一个通用函数中,该函数在不了解格式的情况下按输入序列计算 (即字符串序列)。 The rest, ie parsing an actual log file, can be made purely functional. 剩下的部分,即解析实际的日志文件,可以使之完全起作用。

I have written this function in the past to help with this. 过去,我已经编写了此功能来帮助解决此问题。

let chunkBy chunkIdentifier source = 
    seq { 
        let chunk = ref []
        for sourceItem in source do
            let isNewChunk = chunkIdentifier sourceItem
            if isNewChunk && !chunk <> [] then 
                yield !chunk
                chunk := [ sourceItem ]
            else chunk := !chunk @ [ sourceItem ] 

        yield !chunk
    }

It takes a chunkIdentifier function which returns true if the input is the start of a new chunk. 它需要一个chunkIdentifier函数,如果输入是新块的开始,则该函数返回true。

Parsing a log file is simply a case of extracting the lines, computing the chunks and joining each chunk: 解析日志文件只是提取行,计算块并连接每个块的一种情况:

logEntryLines |> chunkBy (fun line -> line.[0] = '[')
    |> Seq.map (fun s -> String.Join (" ", s))

By encapsulating the iteration and mutation as much as possible, while creating a reusable function, it's more in the spirit of functional programming. 通过尽可能多地封装迭代和变异,同时创建可重用的函数,这更符合函数式编程的精神。

Alternatively, another two variants: 另外,还有两个变体:

let lst = ["[20150101] error a";
           "details 1";
           "details 2";
           "[20150101] error b";
           "details";
           "[20150101] error c";]

let fun1 (xs:string list) = 
    let sb = new System.Text.StringBuilder(xs.Head) 
    xs.Tail

    |> Seq.iter(fun x -> match x.[0] with
                         | '[' -> sb.Append("\n" + x) 
                         | _   -> sb.Append(" "  + x) 
                         |> ignore)
    sb.ToString()

lst  |> fun1 |> printfn "%s"

printfn "";

let fun2 (xs:string list) =  
    List.fold(fun acc (x:string) -> acc + 
                                    match x.[0] with| '[' -> "\n"  | _   -> " " 
                                    + x) xs.Head xs.Tail 

lst |> fun2 |> printfn "%s"

Print: 打印:

[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c

[20150101] error a details 1 details 2
[20150101] error b details
[20150101] error c

Link: https://dotnetfiddle.net/3KcIwv 链接: https//dotnetfiddle.net/3KcIwv

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM