简体   繁体   English

如何在F#中编写高效的list / seq函数? (mapFoldWhile)

[英]How to write efficient list/seq functions in F#? (mapFoldWhile)

I was trying to write a generic mapFoldWhile function, which is just mapFold but requires the state to be an option and stops as soon as it encounters a None state. 我试图编写一个通用的mapFoldWhile函数,它只是mapFold但是要求state是一个option并在遇到None状态时立即停止。

I don't want to use mapFold because it will transform the entire list, but I want it to stop as soon as an invalid state (ie None ) is found. 我不想使用mapFold因为它会转换整个列表,但是我希望它一旦找到无效状态(即None )就停止。

This was myfirst attempt: 这是我的第一次尝试:

let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
  let rec mapRec f state list results =
    match list with 
    | [] -> (List.rev results, state)
    | item :: tail -> 
      let (result, newState) = f state item
      match newState with 
      | Some x -> mapRec f newState tail (result :: results)
      | None -> ([], None)
  mapRec f state list []

The List.rev irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower. List.rev让我List.rev厌烦,因为练习的目的是提前退出并构建一个新的列表应该更慢。

So I looked up what F#'s very own map does, which was: 所以我查看了F#自己的map所做的事情,其中​​包括:

let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list

The ominous Microsoft.FSharp.Primitives.Basics.List.map can be found here and looks like this: 可以在这里找到不祥的Microsoft.FSharp.Primitives.Basics.List.map ,如下所示:

let map f x = 
    match x with
    | [] -> []
    | [h] -> [f h]
    | (h::t) -> 
        let cons = freshConsNoTail (f h)
        mapToFreshConsTail cons f t
        cons

The consNoTail stuff is also in this file: consNoTail东西也在这个文件中:

// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)

So I guess it turns out that F#'s immutable lists are actually mutable because performance? 所以我猜结果F#的不可变列表实际上是可变的,因为性能? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#. 我有点担心这个,使用了prepend-then-reverse列表方法,因为我认为这是F#中的“方法”。

I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile function is the wrong thing to do, but then what am I to do instead? 我对F#或函数式编程一般都不是很有经验,所以也许(可能)创建一个新的mapFoldWhile函数的想法是错误的,但那我该做什么呢?

I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. 我经常发现自己处于需要“提前退出”的情况,因为收集项目是“无效的”,我知道我不必看其余的。 I'm using List.pick or Seq.takeWhile in some cases, but in other instances I need to do more ( mapFold ). 我在某些情况下使用List.pickSeq.takeWhile ,但在其他情况下我需要做更多( mapFold )。

Is there an efficient solution to this kind of problem ( mapFoldWhile in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List ? 是否有一个有效的解决方案来解决这类问题(特别是mapFoldWhile和一般的“早退”),或者我是否必须切换到命令式解决方案/使用Collections.Generics.List

In most cases, using List.rev is a perfectly sufficient solution. 在大多数情况下,使用List.rev是一个非常充分的解决方案。

You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. 你是对的,F#核心库使用变异和其他脏黑客来从F#list操作中挤出更多的性能,但我认为那里的微优化并不是特别好的例子。 F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations. F#list函数几乎在所有地方使用因此它可能是一个很好的权衡,但在大多数情况下我不会遵循它。

Running your function with the following: 使用以下命令运行您的功能:

let l = [ 1 .. 1000000 ]

#time 
mapFoldWhile (fun s v -> 0, s) (Some 1) l

I get ~240ms on the second line when I run the function without changes. 当我运行该功能而没有更改时,我在第二行得到~240ms。 When I just drop List.rev (so that it returns the data in the other order), I get around ~190ms. 当我只是删除List.rev (以便它以其他顺序返回数据)时,我大约需要190毫秒。 If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it. 如果你真的经常调用这个函数,这很重要,那么你必须使用变异(实际上,你自己的可变列表类型),但我认为这很少值得。

For general "exit early" problems, you can often write the code as a composition of Seq.scan and Seq.takeWhile . 对于一般的“退出早期”问题,您通常可以将代码编写为Seq.scanSeq.takeWhile的组合。 For example, say you want to sum numbers from a sequence until you reach 1000. You can write: 例如,假设您想要对序列中的数字求和,直到达到1000.您可以写:

input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)

Using Seq.scan generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile stops the computation as soon as the exit condition happens. 使用Seq.scan生成一个在整个输入上的和的序列,但由于这是懒惰生成的,因此使用Seq.takeWhile会在退出条件发生时立即停止计算。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM