I am building a crawler and I am using aBot to do it. It is a very nice system :) During the development I found an issue that is more related to how I want to build my crawler than the aBot project itself, but I hope you can help me.
When setting a crawler, I specify the method to be called when the crawling is complete, there is sync and async options.
crawler.PageCrawlCompleted += crawler_ProcessPageCrawlCompleted;
crawler.PageCrawlCompletedAsync += crawler_ProcessPageCrawlCompleted;
I would like to use the async one because then I would be crawling another url while processing the older one. This works fine until I crawl the last url. When I crawl the last one, I call the completeAsync method and my crawler is done working, so it finishes and the program is closed without finishing processing the _ProcessPageCrawlComplete method entirely, so I cannot guarantee that the last url will be processed.
Is there any way I can wait for this last event to finish before closing the application? Is this a design flaw?
Edit: I forgot to mention: I do have access to the crawler code. My current workaround is: if the link is the last one to be processed, create a WaitHandle and wait for it to complete. Sound a bit messy, though...
ManualResetEvent can be one solution:
In your calling method:
//Declare the reset event
ManualResetEvent mre = new ManualResetEvent(false);
//Call the async method and subscribe to the event
crawler.PageCrawlCompletedAsync += crawler_ProcessPageCrawlCompleted;
//The application will wait here until the mre is set.
mre.WaitOne();
In your event handler:
private void crawler_ProcessPageCrawlCompleted(...)
{
....
mre.Set();
}
Another approach can be the CountdownEvent . Suppose you need to crawl 10 pages:
CountdownEvent countdown = new CountdownEvent (10);
//Subscribe to the event
crawler.PageCrawlCompletedAsync += crawler_ProcessPageCrawlCompleted;
//Call 10 time the async method
....
//Wait for all events to complete
countdown.Wait();
In the handler:
private void crawler_ProcessPageCrawlCompleted(...)
{
....
mre.Signal();
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.