简体   繁体   English

ConcurrentQueue和Parallel.ForEach

[英]ConcurrentQueue and Parallel.ForEach

I have a ConcurrentQueue with a list of URLs that I need to get the the source of. 我有一个ConcurrentQueue,其中包含需要获取其来源的URL列表。 When using the Parallel.ForEach with the ConcurrentQueue object as the input parameter, the Pop method won't work nothing (Should return a string). 当将Parallel.ForEach与ConcurrentQueue对象用作输入参数时,Pop方法将无效(应返回一个字符串)。

I'm using Parallel with the MaxDegreeOfParallelism set to four. 我使用MaxDegreeOfParallelism设置为四个的Parallel。 I really need to block the number of concurrent threads. 我真的需要阻止并发线程的数量。 Is using a queue with Parallelism redundant? 使用具有并行性的队列是否多余?

Thanks in advance. 提前致谢。

// On the main class
var items = await engine.FetchPageWithNumberItems(result);
// Enqueue List of items
itemQueue.EnqueueList(items);
var crawl = Task.Run(() => { engine.CrawlItems(itemQueue); });

// On the Engine class
public void CrawlItems(ItemQueue itemQueue)
{
Parallel.ForEach(
            itemQueue,
            new ParallelOptions {MaxDegreeOfParallelism = 4},
            item =>
            {

                var worker = new Worker();
                // Pop doesn't return anything
                worker.Url = itemQueue.Pop();
                /* Some work */
             });
 }

// Item Queue
class ItemQueue : ConcurrentQueue<string>
    {
        private ConcurrentQueue<string> queue = new ConcurrentQueue<string>();

        public string Pop()
        {
            string value = String.Empty;
            if(this.queue.Count == 0)
                throw new Exception();
            this.queue.TryDequeue(out value);
            return value;
        }

        public void Push(string item)
        {
            this.queue.Enqueue(item);
        }

        public void EnqueueList(List<string> list)
        {
            list.ForEach(this.queue.Enqueue);
        }
    }

You don't need ConcurrentQueue<T> if all you're going to do is to first add items to it from a single thread and then iterate it in Parallel.ForEach() . 如果您要做的就是首先从单个线程向其中添加项目,然后在Parallel.ForEach()对其进行迭代,则不需要ConcurrentQueue<T> A normal List<T> would be enough for that. 一个普通的List<T>就足够了。

Also, your implementation of ItemQueue is very suspicious: 另外,您对ItemQueue的实现非常可疑:

  • It inherits from ConcurrentQueue<string> and also contains another ConcurrentQueue<string> . 它继承自ConcurrentQueue<string> ,还包含另一个ConcurrentQueue<string> That doesn't make much sense, is confusing and inefficient. 这没有多大意义,令人困惑且效率低下。

  • The methods on ConcurrentQueue<T> were designed very carefully to be thread-safe. ConcurrentQueue<T>上的方法经过精心设计,以确保线程安全。 Your Pop() isn't thread-safe. 您的Pop()不是线程安全的。 What could happen is that you check Count , notice it's 1, then call TryDequeue() and not get any value (ie value will be null ), because another thread removed the item from the queue in the time between the two calls. 可能发生的情况是,您检查Count ,注意它为1,然后调用TryDequeue()而不获取任何值(即value将为null ),因为另一个线程在两次调用之间的时间内从队列中删除了该项目。

The issue is with CrawlItems method, since you shouldn't call Pop in the action provided to the ForEach method. 问题出在CrawlItems方法上,因为您不应在ForEach方法提供的操作中调用Pop。 The reason is that the action is being called on each popped item, hence the item was already popped. 原因是正在对每个弹出项目调用该操作,因此该项目已经被弹出。 This is the reason that the action has an 'item' argument. 这就是该动作具有“ item”参数的原因。

I assume that you're getting null since all of the items already popped by the other threads, by the ForEach method. 我假设由于所有其他线程已经通过ForEach方法弹出的所有项目,您将获得null。

Therefore, your code should look like this: 因此,您的代码应如下所示:

public void CrawlItems(ItemQueue itemQueue)
{
    Parallel.ForEach(
        itemQueue,
        new ParallelOptions {MaxDegreeOfParallelism = 4},
        item =>
        {
            worker.Url = item;
            /* Some work */
         });
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM