![](/img/trans.png)
[英]c# Data table insert to excel with large record set is slow 4500 records over 6 minutes
[英]c# custom group by really slow for large data set
当我运行以下代码(其中campaigns.Count()为200,000)时,该代码的运行速度确实很慢。
List<Campaign> listCampaigns = new List<Campaign>();
foreach (var item in campaigns)
{
if (listCampaigns.Where(a => a.CampaignName == item.CampaignName && a.Term == item.Term).Count() == 0)
{
//this doesn't exist
listCampaigns.Add(item);
}
else
{
//this exists already
var campaign = listCampaigns.Where(a => a.CampaignName == item.CampaignName && a.Term == item.Term).First();
campaign.TotalVisits += item.TotalVisits;
List<Conversion> listConversions = item.Conversions.ToList();
listConversions.AddRange(campaign.Conversions.ToList());
campaign.Conversions = listConversions.ToArray();
}
}
是否有必要优化此代码的一部分或使用另一种方法来加快速度?
任何建议表示赞赏。 谢谢。
这应该快得多:
List<Campaign> listCampaigns = new List<Campaign>();
foreach (var g in campaigns.GroupBy(c => new { c.CampaignName, c.Term }))
{
var campaign = g.First();
campaign.TotalVisits = g.Sum(x => x.TotalVisits);
campaign.Conversions = g.SelectMany(c => c.Conversions).ToArray();
listCampaigns.Add(campaign);
}
采用 。 Dictionary<Tuple<string,Term>,Campaign>
。 您可以将CampaignName和Term放入元组,并使用它在O(1)中查找现有Campaign。 这使得整个代码为O(n)。
我们当前的代码为O(n ^ 2),因为它需要遍历整个列表以检查当前条目的存在。
该代码应类似于以下内容:
var dict=new Dictionary<Tuple<string,Term>,Campaign>();
var currentKey=new Tuple<string,Term>(item.CampaignName, item.Term == item.Term);
Campaign existingCampaign;
if (dict.TryGetValue(currentKey,out existingCampaign))
{
//already exists
}
else
{
//new
}
您能否避免将200,000个广告系列项目的“转换”拖入具体列表,然后再将其添加到主列表中?
我会:
这是新的代码:
List<Campaign> listCampaigns = new List<Campaign>();
foreach (var item in campaigns)
{
if (!listCampaigns.Any(a => a.CampaignName == item.CampaignName && a.Term == item.Term))
{
//this doesn't exist
listCampaigns.Add(item);
}
else
{
//this exists already
var campaign = listCampaigns.First(a => a.CampaignName == item.CampaignName && a.Term == item.Term);
campaign.TotalVisits += item.TotalVisits;
//Reduces the number of collection copies created per iteration from 3 to 1
campaign.Conversions = campaignConversions.Concat(item.Conversions).ToArray();
}
}
在该代码中:
foreach (var item in campaigns)
{
var campaign = listCampaigns.FirstOrDefault(a => a.CampaignName == item.CampaignName && a.Term == item.Term);
if (campaign == null)
{
//this doesn't exist
listCampaigns.Add(item);
}
else
{
//this exists already
campaign.TotalVisits += item.TotalVisits;
List<Conversion> listConversions = item.Conversions.ToList();
listConversions.AddRange(campaign.Conversions.ToList());
campaign.Conversions = listConversions.ToArray();
}
}
您避免使用FirstOrDefault
多次浏览列表。 同样,您很可能不会每次都对列表进行全面评估,从而节省了更多时间。
至少使用Any()
代替Count()
-在这种情况下,您不必检查完整列表:
if (listCampaigns.Where(a => a.CampaignName == item.CampaignName
&& a.Term == item.Term).Any())
另外,正如其他人指出的那样,快速访问Dictionary
将是一个更快的选择,您必须为此为每个Campaign
定义一个唯一的键值,然后可以使用Dictionary<string,Campaign>
使用Dictionary<TKey,Campaign>
。 这样,您就可以使用哈希表检查值是否存在,并在O(1)中找到相应的值
代码示例:
var dictCampaigns = new Dictionary<Key, Campaign>();
foreach (var item in campaigns)
{
Campaign found;
var key = new Key(item);
if(!dictCampaigns.TryGetValue(key,out found))
{
dictCampaigns.Add(key, item);
}
else
{
found.TotalVisits += item.TotalVisits;
found.Conversions = (item.Conversions.Concat(found.Conversions)).ToArray();
}
}
我假设您可能无法使用元组时使用了Key
结构:
struct Key
{
public readonly string Name;
public readonly int Term;
public Key(Campaign camp)
{
Name = camp.CampaignName;
Term = camp.Term;
}
}
我使用StopWatch
进行了粗略的测量,它比您的代码快两倍,但我认为仍然可以对其进行优化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.