简体   繁体   English

使用AsParallel()/ Parellel.ForEach()指南?

[英]Using AsParallel()/Parellel.ForEach() guidelines?

Looking for a little advice on leveraging AsParallel() or Parallel.ForEach() to speed this up. 寻找有关利用AsParallel()Parallel.ForEach()来加快速度的建议。

See the method I've got (simplified/bastardized for this example) below. 请参阅下面的方法(本例中简化/标准化)。

It takes a list like "US, FR, APAC", where "APAC" is an alias for maybe 50 other "US, FR, JP, IT, GB" etc. countires. 它需要一个像“US,FR,APAC”这样的列表,其中“APAC”是50个其他“US,FR,JP,IT,GB”等的别名。 The method should take "US, FR, APAC", and convert it to a list of "US", "FR", plus all the countries that are in "APAC". 该方法应采用“US,FR,APAC”,并将其转换为“US”,“FR”列表以及“APAC”中的所有国家/地区。

private IEnumerable<string> Countries (string[] countriesAndAliases)
{
    var countries = new List<string>();

    foreach (var countryOrAlias in countriesAndAliases)
    {
        if (IsCountryNotAlias(countryOrAlias))
        {
            countries.Add(countryOrAlias);
        }
        else 
        {
            foreach (var aliasCountry in AliasCountryLists[countryOrAlias]) 
            {
                countries.Add(aliasCountry);
            }
        }
    }

    return countries.Distinct();
}

Is making this parallelized as simple as changing it to what's below? 是否将其并行化为将其更改为以下内容? Is there more nuance to using AsParallel() than this? 使用AsParallel()比这更细微吗? Should I be using Parallel.ForEach() instead of foreach ? 我应该使用Parallel.ForEach()而不是foreach吗? What rules of thumb should I use when parallelizing foreach loops? 在并行化foreach循环时,我应该使用哪些经验法则?

private IEnumerable<string> Countries (string[] countriesAndAliases)
{
    var countries = new List<string>();

    foreach (var countryOrAlias in countriesAndAliases.AsParallel())
    {
        if (IsCountryNotAlias(countryOrAlias))
        {
            countries.Add(countryOrAlias);
        }
        else 
        {
            foreach (var aliasCountry in AliasCountryLists[countryOrAlias].AsParallel()) 
            {
                countries.Add(aliasCountry);
            }
        }
    }

    return countries.Distinct();
}

Several points. 几点。

writing just countriesAndAliases.AsParallel() is useless. 只写countryAndAliases.AsParallel countriesAndAliases.AsParallel()是没用的。 AsParallel() makes part of Linq query that comes after it execute in parallel. AsParallel()是Linq查询的一部分,它在并行执行之后出现。 Part is empty, so no use at all. 部分是空的,所以根本没用。

generally you should repace foreach with Parallel.ForEach() . 通常你应该使用Parallel.ForEach()来重述foreach But beware of not thread safe code! 但要注意不是线程安全的代码! You have it. 你拥有了它。 You can't just wrap it into foreach because List<T>.Add is not thread safe itself. 你不能只将它包装到foreach因为List<T>.Add本身不是线程安全的。

so you should do like this (sorry, i didn't test, but it compiles): 所以你应该这样做(对不起,我没有测试,但它编译):

        return countriesAndAliases
            .AsParallel()
            .SelectMany(s => 
                IsCountryNotAlias(s)
                    ? Enumerable.Repeat(s,1)
                    : AliasCountryLists[s]
                ).Distinct();

Edit : 编辑

You must be sure about two more things: 你必须确定另外两件事:

  1. IsCountryNotAlias must be thread safe. IsCountryNotAlias必须是线程安全的。 It would be even better if it is pure function . 如果它是纯粹的功能会更好。
  2. No one will modify AliasCountryLists in a meanwhile, because dictionaries are not thread safe. 同时没有人会修改AliasCountryLists ,因为字典不是线程安全的。 Or use ConcurrentDictionary to be sure. 或者使用ConcurrentDictionary来确定。

Useful links that will help you: 有用的链接,可以帮助您:

Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4 并行编程模式:使用.NET Framework理解和应用并行模式4

Parallel Programming in .NET 4 Coding Guidelines .NET 4编码指南中的并行编程

When Should I Use Parallel.ForEach? 我什么时候应该使用Parallel.ForEach? When Should I Use PLINQ? 我什么时候应该使用PLINQ?

PS : As you see new parallel features are not as obvious as they look (and feel). PS :正如您所看到的那样,新的并行功能并不像它们看起来那样明显(和感觉)。

When using AsParallel(), you need to make sure that your body is thread safe. 使用AsParallel()时,您需要确保您的身体是线程安全的。 Unfortunately, the above code will not work. 不幸的是,上面的代码不起作用。 List<T> is not thread safe, so your addition of AsParallel() will cause a race condition. List<T>不是线程安全的,因此添加AsParallel()将导致竞争条件。

If, however, you switch your collections to using a collection in System.Collections.Concurrent , such as ConcurrentBag<T> , the above code will most likely work. 但是,如果将集合切换为使用System.Collections.Concurrent中的集合(例如ConcurrentBag<T> ,则上述代码很可能会起作用。

I would prefer to use another data structure like a Set for each alias and then use Set union to merge them. 我更喜欢为每个别名使用另一个数据结构,如Set,然后使用Set union来合并它们。

Something like this 像这样的东西

public string[] ExpandAliases(string[] countries){
    // Alias definitions
    var apac = new HashSet<string> { "US", "FR", ...};
    ... 

    var aliases = new HashMap<string, Set<string>> { {"APAC": apac}, ... };

    var expanded = new HashSet<string>
    foreach(var country in countries){
        if(aliases.Contains(country)
            expanded.Union(aliases[country]);
        else{
            expanded.Add(country);
    }

    return expanded.ToArray();
}

Note: code should be viewed as pseudo-code. 注意:代码应被视为伪代码。

This seems like an inherently serial operation to me. 这对我来说似乎是一种固有的连续操作。 All you're doing is looping through a list of strings and inserting them into another list. 您所做的只是循环遍历字符串列表并将它们插入另一个列表中。 The parallelization libraries are going to do that, plus a bunch of threading and synchronization - it'd probably end up slower. 并行化库将会这样做,加上一堆线程和同步 - 它可能最终会变慢。

Also, you should be using a HashSet<string> if you don't want duplicates. 此外,如果您不想重复,则应使用HashSet<string>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM