简体   繁体   English

调用相同方法的并发线程导致 ConcurrentBag 中的重复<DataRow>收藏

[英]Concurrent threads calling same method leads to duplicates in ConcurrentBag<DataRow> collection

I have to process a large data set which takes over an hour on a single thread.我必须处理一个大数据集,在单个线程上需要一个多小时。 I have implemented some multithreading to speed this up.我已经实现了一些多线程来加快速度。 Each thread handles a specific range of data with no overlap, but when they insert their results into the ConcurrentBag<DataRow> collection I created, there are some duplicates.每个线程处理特定范围的数据而没有重叠,但是当它们将结果插入我创建的ConcurrentBag<DataRow>集合时,有一些重复。

How is this possible?这怎么可能? Any suggestions on what I could be doing better are appreciated!任何关于我可以做得更好的建议表示赞赏!

Main method:主要方法:

public static ConcurrentBag<DataRow> finalRowList = new ConcurrentBag<DataRow>(); //Create a concurrent collection of datarows so we can thread these calculations
public static DataTable results = new DataTable(); //Final datatable the datarows are added to

static void Main(string[] args)
{
//The goal is to calculate correlation between each item in list 1 against each item in list 2
List<string> Variable1List = populateVariable1List(); //Primary List of distinct items to iterate over
List<string> Variable2List = populateVariable2List(); //Secondary list of distinct items

DateTime endDate = new DateTime(2020, 3, 31);

//Separate threads based on alphabetic ranges so there is no overlap
Thread t1 = new Thread(() => CalculatePairCorrelation(Variable1List.Where(s => string.Compare(s, "G") < 0), Variable2List, endDate));
Thread t2 = new Thread(() => CalculatePairCorrelation(Variable1List.Where(s => string.Compare(s, "G") >= 0 && string.Compare(s, "M") < 0), Variable2List, endDate));
Thread t3 = new Thread(() => CalculatePairCorrelation(Variable1List.Where(s => string.Compare(s, "M") >= 0 && string.Compare(s, "S") < 0), Variable2List, endDate));
Thread t4 = new Thread(() => CalculatePairCorrelation(Variable1List.Where(s => string.Compare(s, "S") >= 0), Variable2List, endDate));

List<Thread> threads = new List<Thread>();
threads.Add(t1);
threads.Add(t2);
threads.Add(t3);
threads.Add(t4);

foreach (Thread t in threads)
{
    t.Start();
}

foreach (Thread t in threads)
{
    t.Join();
}

//Add rows from finalRowList to final datatable
foreach (var dr in finalRowList)
{
    results.Rows.Add(dr);
}
}

CalculatePairCorrelation() code: CalculatePairCorrelation() 代码:

public static void CalculatePairCorrelation(IEnumerable<string> list1, IEnumerable<string> list2, DateTime endDate, int rows)
{
    foreach (var item1 in list1)
    {
        foreach (var item2 in list2)
        {                
            double r10 = CalculateCorrelation(item1, item2, endDate, 10);
            double r30 = CalculateCorrelation(item1, item2, endDate, 30);

            var dr = results.NewRow();
            dr["Item1"] = item1;
            dr["Item2"] = item2;
            dr["R10"] = r10;
            dr["R30"] = r30;

            finalRowList.Add(dr); //Add to thread-safe collection
        }
    }
}


The problem could be related with this line, which is called in the parallel path:问题可能与在并行路径中调用的这条线有关:

var dr = results.NewRow();

Creating a DataRow probably mutates the underlying DataTable , which is not a thread-safe class.创建DataRow可能会改变底层DataTable ,它不是线程安全类。

My suggestion is to stay away from concurrent collections and manual partitioning of the data, and use instead PLINQ which is easy to use, and makes it harder for something to go wrong:我的建议是远离并发收集和手动数据分区,而使用PLINQ ,它易于使用,并且更难出错:

var resultsList = Variable1List
    .SelectMany(_ => Variable2List, (Item1, Item2) => (Item1, Item2))
    .AsParallel()
    .AsOrdered() // Optional
    .WithDegreeOfParallelism(4) // Optional
    .Select(pair => (
        Item1: pair.Item1,
        Item2: pair.Item2,
        R10: CalculateCorrelation(pair.Item1, pair.Item2, endDate, 10),
        R30: CalculateCorrelation(pair.Item1, pair.Item2, endDate, 30)
    ))
    .ToList();

foreach (var result in resultsList)
{
    var dr = results.NewRow();
    dr["Item1"] = result.Item1;
    dr["Item2"] = result.Item2;
    dr["R10"] = result.R10;
    dr["R30"] = result.R30;
    results.Rows.Add(dr);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM