简体   繁体   English

F# 邮箱处理器问题

[英]F# MailboxProcessor questions

I've created a console program using the code from http://fssnip.net/3K .我使用http://fssnip.net/3K中的代码创建了一个控制台程序。 And I found that我发现

  1. I'd to add "System.Console.ReadLine() |> ignore" at the end to wait for the finish of threads.我要在末尾添加“System.Console.ReadLine() |> ignore”以等待线程完成。 Is it possible to tell all the MailBoxProcessors are done and the program can exit itself?是否可以告诉所有 MailBoxProcessor 都已完成并且程序可以自行退出?

  2. I tried to change the test url "www.google.com" to something invalid url and I got the following output.我试图将测试 url "www.google.com" 更改为无效的 url 并得到以下 output。 Is it possible to avoid the "outputting race"?能否避免“输出竞赛”?

     http://www.google.co1m crawled by agent 1.  
     AgAAAent gent 3 is done.  
     gent 2 is done.  
     5 is done.  
     gent 4 is done.  
     Agent USupervisor RL collector is done.  
     is done.  
     1 is done.

[Edit] [编辑]

The last output/crawling is still terminated after using Tomas's update http://fssnip.net/65 .使用 Tomas 的更新http://fssnip.net/65后,最后的输出/爬取仍然终止。 The following is the output of the program after I changed the "limit" to 5 and added some debugging messages.以下是我将“limit”改为5并添加一些调试信息后程序的output。 The last line shows the truncated URL.最后一行显示截断的 URL。 Is it a way to detect if all the crawlers finish their execution?这是一种检测所有爬虫是否完成执行的方法吗?

[Main] before crawl
[Crawl] before return result
http://news.google.com crawled by agent 1.
[supervisor] reached limit
http://www.gstatic.com/news/img/favicon.ico crawled by agent 5.
Agent 2 is done.
[supervisor] reached limit
Agent 5 is done.
http://www.google.com/imghp?hl=en&tab=ni crawled by agent 3.
[supervisor] reached limit
Agent 3 is done.
http://www.google.com/webhp?hl=en&tab=nw crawled by agent 4.
[supervisor] reached limit
Agent 4 is done.
http://news.google.com/n

I changed the main code to我将主要代码更改为

printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"

However, the last printfn "[Main] after crawl" is never executed, unless I add a Console.Readline() at the end.但是,最后一个printfn "[Main] after crawl"永远不会执行,除非我在最后添加一个 Console.Readline() 。

[Edit 2] [编辑 2]

The code runs fine under fsi.代码在 fsi 下运行良好。 However it will have the same problem if it was run using fsi --use:Program.fs --exec --quiet但是,如果使用 fsi --use:Program.fs --exec --quiet 运行它也会有同样的问题

I created a snippet that extends the previous one with the two features you asked about: http://fssnip.net/65 .我创建了一个片段,它使用您询问的两个功能扩展了前一个片段: http://fssnip.net/65

  1. To solve this, I added Start message that carries AsyncReplyChannel<unit> .为了解决这个问题,我添加了带有AsyncReplyChannel<unit>Start消息。 When the supervisor agent starts, it waits for this message and saves the reply channel for later use.当主管代理启动时,它会等待此消息并保存回复通道以供以后使用。 When it completes, it sends a reply using this channel.完成后,它使用此通道发送回复。

    The function that starts the agent returns asynchronous workflow that waits for the reply.启动代理的 function 返回等待回复的异步工作流。 You can then call crawl using Async.RunSynchronously , which will complete when the supervisor agent completes.然后,您可以使用Async.RunSynchronously调用crawl ,这将在主管代理完成时完成。

  2. To avoid race when printing, you need to synchronize all prints.为避免打印时出现竞争,您需要同步所有打印。 The easiest way to do this is to write a new agent:-).最简单的方法是编写一个新代理:-)。 The agent receives strings and prints them to the output one by one (so that they cannot be interleaved).代理接收字符串并一一打印到output(使它们不能交错)。 The snippet hides the standard printfn function with a new implementation that sends strings to the agent.该代码段隐藏了标准printfn function,其中包含向代理发送字符串的新实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM