簡體   English   中英

如何有效地使用大數據集中LINQ並行中的子句或選擇

[英]How to Use Effeciently Where Clause or Select in LINQ Parallel in Large Dataset

我有大約250,000條記錄標記為Boss,每個Boss有2到10名職員。 我每天都需要了解員工的詳細信息。 大約有1,000,000名員工。 我正在使用Linq獲取每日工作人員的唯一列表。 請考慮以下C#LINQ和模型

void Main()
{

    List<Boss> BossList = new List<Boss>()
    {
        new Boss()
        {
            EmpID = 101,
            Name = "Harry",
            Department = "Development",
            Gender = "Male",
            Employees = new List<Person>()
            {
                new Person() {EmpID = 102, Name = "Peter", Department = "Development",Gender = "Male"},
                new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},

            }
        },
        new Boss()
        {
            EmpID = 104,
            Name = "Raj",
            Department = "Development",
            Gender = "Male",
            Employees = new List<Person>()
                    {
                        new Person() {EmpID = 105, Name = "Kaliya", Department = "Development",Gender = "Male"},
                        new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development",Gender = "Female"},

                    }
        },

        ..... ~ 250,000 Records ......

    };

    List<Person> staffList = BossList
    .SelectMany(x =>
        new[] { new Person { Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID } }
        .Concat(x.Employees))
    .GroupBy(x => x.EmpID) //Group by employee ID
    .Select(g => g.First()) //And select a single instance for each unique employee
    .ToList();
}

public class Person
{
    public int EmpID { get; set; }
    public string Name { get; set; }
    public string Department { get; set; }
    public string Gender { get; set; }
}

public class Boss
{
    public int EmpID { get; set; }
    public string Name { get; set; }
    public string Department { get; set; }
    public string Gender { get; set; }
    public List<Person> Employees { get; set; }
}

在上面的LINQ我得到了不同員工或員工名單,該列表包含超過1,000,000條記錄。 從獲得的列表中我需要搜索“Raj”

staffList.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant()));

對於此操作,獲得結果需要3到5分鍾。

我怎么能讓它更有效率。 請幫助我......

如果你改變Boss繼承Personpublic class Boss : Person ),你不僅不需要在PersonBoss復制你的屬性,你不必為每個Boss創建所有新的Person實例,因為Boss已經是一個Person

IEnumerable<Person> staff = BossList 
    .Concat(BossList
        .SelectMany(x => x.Employees)
    )
    .DistinctBy(p => p.EmpId)
    .ToList()

其中DistinctBy定義為

public static IEnumerable<TSource> DistinctBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    var seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        if (seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

此外,在您的比較中,您將每個Name轉換為小寫並進行比較 - 這是您不需要的大量字符串創建。 相反,嘗試類似的東西

staffList.Where(m => m.Name.Equals("Raj", StringComparison.InvariantCultureIgnoreCase));

此外,請注意您使用Contains也會匹配Rajamussenmirajii等名稱 - 可能不是您所期望的。

你可以將staffList更改為字典嗎? 像Dictionary和SortedList那樣更好的搜索算法可以讓你獲得最大的改進。

我已經測試了下面的代碼,它只需幾秒鍾即可運行。

    private static void Main()
    {

        List<Boss> BossList = new List<Boss>();
        var b1 = new Boss()
        {
            EmpID = 101,
            Name = "Harry",
            Department = "Development",
            Gender = "Male",
            Employees = new List<Person>()
            {
                new Person() {EmpID = 102, Name = "Peter", Department = "Development", Gender = "Male"},
                new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development", Gender = "Female"},

            }
        };

        var b2 = new Boss()
        {
            EmpID = 104,
            Name = "Raj",
            Department = "Development",
            Gender = "Male",
            Employees = new List<Person>()
            {
                new Person() {EmpID = 105, Name = "Kaliya", Department = "Development", Gender = "Male"},
                new Person() {EmpID = 103, Name = "Emma Watson", Department = "Development", Gender = "Female"},

            }
        };

        Random r = new Random();
        var genders = new [] {"Male", "Female"};

        for (int i = 0; i < 1500000; i++)
        {
            b1.Employees.Add(new Person { Name = "Name" + i, Department = "Department" + i, Gender = genders[r.Next(0, 1)], EmpID = 200 + i });
            b2.Employees.Add(new Person { Name = "Nam" + i, Department = "Department" + i, Gender = genders[r.Next(0, 1)], EmpID = 1000201 + i });
        }

        BossList.Add(b1);
        BossList.Add(b2);

        Stopwatch sw = new Stopwatch();
        sw.Start();

        var emps = BossList
            .SelectMany(x =>
                new[] {new Person {Name = x.Name, Department = x.Department, Gender = x.Gender, EmpID = x.EmpID}}
                    .Concat(x.Employees))
            .GroupBy(x => x.EmpID) //Group by employee ID
            .Select(g => g.First());

        var staffList =  emps.ToList();
        var staffDict = emps.ToDictionary(p => p.Name.ToLowerInvariant() + p.EmpID);
        var staffSortedList = new SortedList<string, Person>(staffDict);

        Console.WriteLine("Time to load staffList = " + sw.ElapsedMilliseconds + "ms");

        var rajKeyText = "Raj".ToLowerInvariant();
        sw.Reset();
        sw.Start();

        var rajs1 = staffList.AsParallel().Where(p => p.Name.ToLowerInvariant().Contains(rajKeyText)).ToList();
        Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms");

        sw.Reset();
        sw.Start();

        var rajs2 = staffDict.AsParallel().Where(kvp => kvp.Key.Contains(rajKeyText)).ToList();
        Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms");

        sw.Reset();
        sw.Start();

        var rajs3 = staffSortedList.AsParallel().Where(kvp => kvp.Key.Contains(rajKeyText)).ToList();
        Console.WriteLine("Time to find Raj = " + sw.ElapsedMilliseconds + "ms");

        Console.ReadLine();
    }

    public class Person
    {
        public int EmpID { get; set; }
        public string Name { get; set; }
        public string Department { get; set; }
        public string Gender { get; set; }
    }

    public class Boss
    {
        public int EmpID { get; set; }
        public string Name { get; set; }
        public string Department { get; set; }
        public string Gender { get; set; }
        public List<Person> Employees { get; set; }
    }
}

輸出1:

在此輸入圖像描述

Output2(在搜索上使用.AsParallel()):

在此輸入圖像描述

換句話說,如果你不能使用一些更快的數據結構,up可以通過改變形式來加速你的搜索

staffList.Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant()));

staffList.AsParallel().Where(m => m.Name.ToLowerInvariant().Contains("Raj".ToLowerInvariant()));

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM