[英]How to write efficient list/seq functions in F#? (mapFoldWhile)
I was trying to write a generic mapFoldWhile
function, which is just mapFold
but requires the state
to be an option
and stops as soon as it encounters a None
state. 我试图编写一个通用的mapFoldWhile
函数,它只是mapFold
但是要求state
是一个option
并在遇到None
状态时立即停止。
I don't want to use mapFold
because it will transform the entire list, but I want it to stop as soon as an invalid state (ie None
) is found. 我不想使用mapFold
因为它会转换整个列表,但是我希望它一旦找到无效状态(即None
)就停止。
This was myfirst attempt: 这是我的第一次尝试:
let mapFoldWhile (f : 'State option -> 'T -> 'Result * 'State option) (state : 'State option) (list : 'T list) =
let rec mapRec f state list results =
match list with
| [] -> (List.rev results, state)
| item :: tail ->
let (result, newState) = f state item
match newState with
| Some x -> mapRec f newState tail (result :: results)
| None -> ([], None)
mapRec f state list []
The List.rev
irked me, since the point of the exercise was to exit early and constructing a new list ought to be even slower. List.rev
让我List.rev
厌烦,因为练习的目的是提前退出并构建一个新的列表应该更慢。
So I looked up what F#'s very own map
does, which was: 所以我查看了F#自己的map
所做的事情,其中包括:
let map f list = Microsoft.FSharp.Primitives.Basics.List.map f list
The ominous Microsoft.FSharp.Primitives.Basics.List.map
can be found here and looks like this: 可以在这里找到不祥的Microsoft.FSharp.Primitives.Basics.List.map
,如下所示:
let map f x =
match x with
| [] -> []
| [h] -> [f h]
| (h::t) ->
let cons = freshConsNoTail (f h)
mapToFreshConsTail cons f t
cons
The consNoTail
stuff is also in this file: consNoTail
东西也在这个文件中:
// optimized mutation-based implementation. This code is only valid in fslib, where mutation of private
// tail cons cells is permitted in carefully written library code.
let inline setFreshConsTail cons t = cons.(::).1 <- t
let inline freshConsNoTail h = h :: (# "ldnull" : 'T list #)
So I guess it turns out that F#'s immutable lists are actually mutable because performance? 所以我猜结果F#的不可变列表实际上是可变的,因为性能? I'm a bit worried about this, having used the prepend-then-reverse list approach as I thought it was the "way to go" in F#. 我有点担心这个,使用了prepend-then-reverse列表方法,因为我认为这是F#中的“方法”。
I'm not very experienced with F# or functional programming in general, so maybe (probably) the whole idea of creating a new mapFoldWhile
function is the wrong thing to do, but then what am I to do instead? 我对F#或函数式编程一般都不是很有经验,所以也许(可能)创建一个新的mapFoldWhile
函数的想法是错误的,但那我该做什么呢?
I often find myself in situations where I need to "exit early" because a collection item is "invalid" and I know that I don't have to look at the rest. 我经常发现自己处于需要“提前退出”的情况,因为收集项目是“无效的”,我知道我不必看其余的。 I'm using List.pick
or Seq.takeWhile
in some cases, but in other instances I need to do more ( mapFold
). 我在某些情况下使用List.pick
或Seq.takeWhile
,但在其他情况下我需要做更多( mapFold
)。
Is there an efficient solution to this kind of problem ( mapFoldWhile
in particular and "exit early" in general) with functional programming concepts, or do I have to switch to an imperative solution / use a Collections.Generics.List
? 是否有一个有效的解决方案来解决这类问题(特别是mapFoldWhile
和一般的“早退”),或者我是否必须切换到命令式解决方案/使用Collections.Generics.List
?
In most cases, using List.rev
is a perfectly sufficient solution. 在大多数情况下,使用List.rev
是一个非常充分的解决方案。
You are right that the F# core library uses mutation and other dirty hacks to squeeze some more performance out of the F# list operations, but I think the micro-optimizations done there are not particularly good example. 你是对的,F#核心库使用变异和其他脏黑客来从F#list操作中挤出更多的性能,但我认为那里的微优化并不是特别好的例子。 F# list functions are used almost everywhere so it might be a good trade-off, but I would not follow it in most situations. F#list函数几乎在所有地方使用,因此它可能是一个很好的权衡,但在大多数情况下我不会遵循它。
Running your function with the following: 使用以下命令运行您的功能:
let l = [ 1 .. 1000000 ]
#time
mapFoldWhile (fun s v -> 0, s) (Some 1) l
I get ~240ms on the second line when I run the function without changes. 当我运行该功能而没有更改时,我在第二行得到~240ms。 When I just drop List.rev
(so that it returns the data in the other order), I get around ~190ms. 当我只是删除List.rev
(以便它以其他顺序返回数据)时,我大约需要190毫秒。 If you are really calling the function frequently enough that this matters, then you'd have to use mutation (actually, your own mutable list type), but I think that is rarely worth it. 如果你真的经常调用这个函数,这很重要,那么你必须使用变异(实际上,你自己的可变列表类型),但我认为这很少值得。
For general "exit early" problems, you can often write the code as a composition of Seq.scan
and Seq.takeWhile
. 对于一般的“退出早期”问题,您通常可以将代码编写为Seq.scan
和Seq.takeWhile
的组合。 For example, say you want to sum numbers from a sequence until you reach 1000. You can write: 例如,假设您想要对序列中的数字求和,直到达到1000.您可以写:
input
|> Seq.scan (fun sum v -> v + sum) 0
|> Seq.takeWhile (fun sum -> sum < 1000)
Using Seq.scan
generates a sequence of sums that is over the whole input, but since this is lazily generated, using Seq.takeWhile
stops the computation as soon as the exit condition happens. 使用Seq.scan
生成一个在整个输入上的和的序列,但由于这是懒惰生成的,因此使用Seq.takeWhile
会在退出条件发生时立即停止计算。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.