MailboxProcessor性能问题

Question

I have been trying to design a system which allows a large amount of concurrent users to be represented in memory at the same time. 我一直在尝试设计一个允许大量并发用户同时在内存中表示的系统。 When setting out to design this sytem I immediately thought of some sort of actor based solution a kin to Erlang. 当我开始设计这个系统时，我立刻想到了某种基于actor的解决方案，这是Erlang的亲属。

The system has to be done in .NET, so I started working on a prototype in F# using MailboxProcessor but have run into serious performance problems with them. 系统必须在.NET中完成，所以我开始使用MailboxProcessor在F＃中开发原型，但是遇到了严重的性能问题。 My initial idea was to use one actor (MailboxProcessor) per user to serialize communication the communication for one user. 我最初的想法是每个用户使用一个actor（MailboxProcessor）来为一个用户序列化通信通信。

I have isolated a small piece of code that reproduces the problem I am seeing: 我已经隔离了一小段代码，可以重现我看到的问题：

open System.Threading;
open System.Diagnostics;

type Inc() =

    let mutable n = 0;
    let sw = new Stopwatch()

    member x.Start() =
        sw.Start()

    member x.Increment() =
        if Interlocked.Increment(&n) >= 100000 then
            printf "UpdateName Time %A" sw.ElapsedMilliseconds

type Message
    = UpdateName of int * string

type User = {
    Id : int
    Name : string
}

[<EntryPoint>]
let main argv = 

    let sw = Stopwatch.StartNew()
    let incr = new Inc()
    let mb = 

        Seq.initInfinite(fun id -> 
            MailboxProcessor<Message>.Start(fun inbox -> 

                let rec loop user =
                    async {
                        let! m = inbox.Receive()

                        match m with
                        | UpdateName(id, newName) ->
                            let user = {user with Name = newName};
                            incr.Increment()
                            do! loop user
                    }

                loop {Id = id; Name = sprintf "User%i" id}
            )
        ) 
        |> Seq.take 100000
        |> Array.ofSeq

    printf "Create Time %i\n" sw.ElapsedMilliseconds
    incr.Start()

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i));

    System.Console.ReadLine() |> ignore

    0

Just creating the 100k actors take around 800ms on my quad core i7. 在我的四核i7上创建100k演员需要大约800ms。 Then submitting the UpdateName message to each one of the actor and wait for them to complete takes about 1.8 seconds. 然后将UpdateName消息提交给每个actor并等待它们完成大约需要1.8秒。

Now, I realize there is overhead from all the queue:ing on the ThreadPool, setting/resetting AutoResetEvents, etc internally in the MailboxProcessor. 现在，我意识到所有队列都有开销：在ThreadPool上，在MailboxProcessor内部设置/重置AutoResetEvents等。 But is this really the expected performance? 但这真的是预期的表现吗？ From reading both MSDN and various blogs on the MailboxProcessor I have gotten the idea that it's to be a kin to erlang actors, but from the abyssmal performance I am seeing this doesn't seem to hold true in reality? 通过阅读MSDN和MailboxProcessor上的各种博客，我已经认识到它将成为erlang演员的亲戚，但从我看到的深渊表现来看，这在现实中似乎并不成立？

I also tried a modified version of the code, which uses 8 MailboxProcessors and each one of them hold a Map<int, User> map which is used to lookup a user by id, it yielded some improvements bringing down the total time for the UpdateName operation to 1.2 seconds. 我还尝试了一个代码的修改版本，它使用了8个MailboxProcessors，每个代码都有一个Map<int, User>地图，用于通过id查找用户，它产生了一些改进，减少了UpdateName的总时间操作到1.2秒。 But it still feels very slow, the modified code is here: 但它仍然感觉很慢，修改后的代码在这里：

open System.Threading;
open System.Diagnostics;

type Inc() =

    let mutable n = 0;
    let sw = new Stopwatch()

    member x.Start() =
        sw.Start()

    member x.Increment() =
        if Interlocked.Increment(&n) >= 100000 then
            printf "UpdateName Time %A" sw.ElapsedMilliseconds

type Message
    = CreateUser of int * string
    | UpdateName of int * string

type User = {
    Id : int
    Name : string
}

[<EntryPoint>]
let main argv = 

    let sw = Stopwatch.StartNew()
    let incr = new Inc()
    let mb = 

        Seq.initInfinite(fun id -> 
            MailboxProcessor<Message>.Start(fun inbox -> 

                let rec loop users =
                    async {
                        let! m = inbox.Receive()

                        match m with
                        | CreateUser(id, name) ->
                            do! loop (Map.add id {Id=id; Name=name} users)

                        | UpdateName(id, newName) ->
                            match Map.tryFind id users with
                            | None -> 
                                do! loop users

                            | Some(user) ->
                                incr.Increment()
                                do! loop (Map.add id {user with Name = newName} users)
                    }

                loop Map.empty
            )
        ) 
        |> Seq.take 8
        |> Array.ofSeq

    printf "Create Time %i\n" sw.ElapsedMilliseconds

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(CreateUser(i, sprintf "User%i-UpdateName" i));

    incr.Start()

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i));

    System.Console.ReadLine() |> ignore

    0

So my question is here, am I doing something wrong? 所以我的问题在这里，我做错了吗？ Have I missunderstood how the MailboxProcessor is supposed to be used? 我是否误解了应该如何使用MailboxProcessor？ Or is this performance what is expected. 或者这是预期的表现。

Update: 更新：

So I got a hold of some guys on ##fsharp @ irc.freenode.net, which informed me that using sprintf is very slow, and as it turns out that is where a large part of my performance problems were comming from. 所以我在## fsharp @ irc.freenode.net上找到了一些人，这告诉我使用sprintf非常慢，而事实证明这是我的大部分性能问题都来自于。 But, removing the sprintf operations above and just using the same name for every User, I still end up with about 400ms for doign the operations, which feels really slow. 但是，删除上面的sprintf操作并且只为每个用户使用相同的名称，我仍然最终需要大约400ms才能进行操作，这感觉非常慢。

Answer 1

Now, I realize there is overhead from all the queue:ing on the ThreadPool, setting/resetting AutoResetEvents, etc internally in the MailboxProcessor. 现在，我意识到所有队列都有开销：在ThreadPool上，在MailboxProcessor内部设置/重置AutoResetEvents等。

And printf , Map , Seq and contending for your global mutable Inc . 并且printf ， Map ， Seq和争夺你的全球可变Inc 。 And you're leaking heap-allocated stack frames. 而且你正在泄漏堆分配的堆栈帧。 In fact, only a small proportion of the time taken to run your benchmark has anything to do with MailboxProcessor . 实际上，运行基准测试所花费的时间只占用于邮箱MailboxProcessor一小部分时间。

But is this really the expected performance? 但这真的是预期的表现吗？

I am not surprised by the performance of your program but it does not say much about the performance of MailboxProcessor . 我对你的程序的性能并不感到惊讶，但它并没有说明MailboxProcessor的性能。

From reading both MSDN and various blogs on the MailboxProcessor I have gotten the idea that it's to be a kin to erlang actors, but from the abyssmal performance I am seeing this doesn't seem to hold true in reality? 通过阅读MSDN和MailboxProcessor上的各种博客，我已经认识到它将成为erlang演员的亲戚，但从我看到的深渊表现来看，这在现实中似乎并不成立？

The MailboxProcessor is conceptually somewhat similar to part of Erlang. MailboxProcessor在概念上有点类似于Erlang的一部分。 The abysmal performance you're seeing is due to a variety of things, some of which are quite subtle and will affect any such program. 你所看到的糟糕表现是由于各种各样的事情，其中一些是相当微妙的，并将影响任何这样的程序。

So my question is here, am I doing something wrong? 所以我的问题在这里，我做错了吗？

I think you're doing a few things wrong. 我觉得你做错了几件事。 Firstly, the problem you are trying to solve is not clear so this sounds like an XY problem question. 首先，你试图解决的问题不明确，所以这听起来像一个XY问题。 Secondly, you're trying to benchmark the wrong things (eg you are complaining about microsecond times required to create a MailboxProcessor but may intend to do so only when a TCP connection is established which takes several orders of magnitude longer). 其次，您正在尝试对错误的事情进行基准测试（例如，您正在抱怨创建MailboxProcessor所需的微秒时间，但可能只在建立TCP连接时才会这样做，这需要花费几个数量级的时间）。 Thirdly, you have written a benchmark program that measures the performance of some things but have attributed your observations to completely different things. 第三，你已经编写了一个基准程序来衡量一些事情的表现，但是把你的观察结果归结为完全不同的事情。

Let's look at your benchmark program in more detail. 让我们更详细地看一下您的基准程序。 Before we do anything else, let's fix some bugs. 在我们做任何其他事情之前，让我们修复一些错误。 You should always use sw.Elapsed.TotalSeconds to measure time because it is more precise. 您应该始终使用sw.Elapsed.TotalSeconds来测量时间，因为它更精确。 You should always recur in an async workflow using return! 您应该始终使用return!异步工作流程return! and not do! 而不是do! or you will leak stack frames. 或者你会泄漏堆栈帧。

My initial timings are: 我的初步时间是：

Creation stage: 0.858s
Post stage: 1.18s

Next, let's run a profile to make sure our program really is spending most of its time thrashing the F# MailboxProcessor : 接下来，让我们运行一个配置文件，以确保我们的程序真正花费大部分时间来颠覆F＃ MailboxProcessor ：

77%    Microsoft.FSharp.Core.PrintfImpl.gprintf(...)
 4.4%  Microsoft.FSharp.Control.MailboxProcessor`1.Post(!0)

Clearly not what we'd hoped. 显然不是我们所希望的。 Thinking more abstractly, we are generating lots of data using things like sprintf and then applying it but we're doing the generation and application together. 更抽象地思考，我们使用sprintf东西生成大量数据，然后应用它，但我们正在一起进行生成和应用。 Let's separate out our initialization code: 让我们分离出我们的初始化代码：

let ids = Array.init 100000 (fun id -> {Id = id; Name = sprintf "User%i" id})
...
    ids
    |> Array.map (fun id ->
        MailboxProcessor<Message>.Start(fun inbox -> 
...
            loop id
...
    printf "Create Time %fs\n" sw.Elapsed.TotalSeconds
    let fxs =
      [|for i in 0 .. 99999 ->
          mb.[i % mb.Length].Post, UpdateName(i, sprintf "User%i-UpdateName" i)|]
    incr.Start()
    for f, x in fxs do
      f x
...

Now we get: 现在我们得到：

Creation stage: 0.538s
Post stage: 0.265s

So creation is 60% faster and posting is 4.5x faster. 因此创建速度提高了60％，发布速度提高了4.5倍。

Let's try completely rewriting your benchmark: 让我们尝试完全重写您的基准：

do
  for nAgents in [1; 10; 100; 1000; 10000; 100000] do
    let timer = System.Diagnostics.Stopwatch.StartNew()
    use barrier = new System.Threading.Barrier(2)
    let nMsgs = 1000000 / nAgents
    let nAgentsFinished = ref 0
    let makeAgent _ =
      new MailboxProcessor<_>(fun inbox ->
        let rec loop n =
          async { let! () = inbox.Receive()
                  let n = n+1
                  if n=nMsgs then
                    let n = System.Threading.Interlocked.Increment nAgentsFinished
                    if n = nAgents then
                      barrier.SignalAndWait()
                  else
                    return! loop n }
        loop 0)
    let agents = Array.init nAgents makeAgent
    for agent in agents do
      agent.Start()
    printfn "%fs to create %d agents" timer.Elapsed.TotalSeconds nAgents
    timer.Restart()
    for _ in 1..nMsgs do
      for agent in agents do
        agent.Post()
    barrier.SignalAndWait()
    printfn "%fs to post %d msgs" timer.Elapsed.TotalSeconds (nMsgs * nAgents)
    timer.Restart()
    for agent in agents do
      use agent = agent
      ()
    printfn "%fs to dispose of %d agents\n" timer.Elapsed.TotalSeconds nAgents

This version expects nMsgs to each agent before that agent will increment the shared counter, greatly reducing the performance impact of that shared counter. 此版本需要nMsgs到每个代理程序之前该代理程序将增加共享计数器，从而大大降低该共享计数器的性能影响。 This program also examines performance with different numbers of agents. 该程序还检查了不同数量的代理的性能。 On this machine I get: 在这台机器上我得到：

Agents  M msgs/s
     1    2.24
    10    6.67
   100    7.58
  1000    5.15
 10000    1.15
100000    0.36

So it appears that part of the reasons for the lower msgs/s speed you are seeing is the unusually-large number (100,000) of agents. 因此，您看到的msg / s速度较低的部分原因似乎是异常大量（100,000）的代理。 With 10-1,000 agents the F# implementation is over 10x faster than it is with 100,000 agents. 使用10-1,000个代理程序，F＃实现速度比使用100,000个代理程序快10倍以上。

So if you can make do with this kind of performance then you should be able to write your entire application in F# but if you need to eek out much more performance I would recommend using a different approach. 因此，如果您可以使用这种性能，那么您应该能够在F＃中编写整个应用程序，但如果您需要获得更多性能，我建议您使用不同的方法。 You may not even have to sacrifice using F# (and you can certainly use it for prototyping) by adopting a design like the Disruptor. 你可能甚至不必牺牲使用F＃（并且你当然可以用它来进行原型设计）采用像Disruptor这样的设计。 In practice, I have found that the time spent doing serialization on .NET tends to be much larger than the time spent in F# async and MailboxProcessor . 在实践中，我发现在.NET上进行序列化所花费的时间往往远远大于在F＃async和MailboxProcessor花费的时间。

Answer 2

After eliminating sprintf , I got about 12 secs (mono on Mac is not that fast). 消除了sprintf ，我得到了大约12秒（Mac上的单声道并不那么快）。 Taking Phil Trelford's suggestion of using Dictionary instead of Map, it went to 600ms. 以Phil Trelford建议使用Dictionary而不是Map为例，它达到了600ms。 Haven't tried it on Win/.Net. 没有在Win / .Net上尝试过。

The code change is simple enough, and local mutability is perfectly acceptable to me: 代码更改很简单，本地可变性对我来说是完全可以接受的：

let mb = 
    Seq.initInfinite(fun id -> 
        MailboxProcessor<Message>.Start(fun inbox -> 
            let di = System.Collections.Generic.Dictionary<int,User>()
            let rec loop () =
                async {
                    let! m = inbox.Receive()

                    match m with
                    | CreateUser(id, name) ->
                        di.Add(id, {Id=id; Name=name})
                        return! loop ()

                    | UpdateName(id, newName) ->
                        match di.TryGetValue id with
                        | false, _ -> 
                            return! loop ()

                        | true, user ->
                            incr.Increment()
                            di.[id] <- {user with Name = newName}
                            return! loop ()
                }

            loop ()
        )
    ) 
    |> Seq.take 8
    |> Array.ofSeq

MailboxProcessor性能问题

问题描述

2 个解决方案

解决方案1
16 2013-07-01 09:25:13

解决方案2
2 2013-06-28 20:38:58

MailboxProcessor性能问题

问题描述

2 个解决方案

解决方案1 16 2013-07-01 09:25:13

解决方案2 2 2013-06-28 20:38:58

解决方案1
16 2013-07-01 09:25:13

解决方案2
2 2013-06-28 20:38:58