![](/img/trans.png)
[英]how can I do an implicit conversion from a lambda expression to a custom class?
[英]How can I make this code faster?? List<custom_class>().Find(lambda_expression)
我需要将电子邮件发送与电子邮件退回进行匹配,这样我才能找到它们是否已送达。 要注意的是,我必须将退回限制在发送后的4天内,以消除匹配错误的退回发送。 发送记录分布在30天内。
LinkedList<event_data> sent = GetMyHugeListOfSends(); //for example 1M+ records
List<event_data> bounced = GetMyListOfBounces(); //for example 150k records
bounced = bounced.OrderBy(o => o.event_date).ToList(); //this ensures the most accurate match of bounce to send (since we find the first match)
List<event_data> delivered = new List<event_data>();
event_data deliveredEmail = new event_data();
foreach (event_data sentEmail in sent)
{
event_data bounce = bounced.Find(item => item.email.ToLower() == sentEmail.email.ToLower() && (item.event_date > sentEmail.event_date && item.event_date < sentEmail.event_date.AddDays(deliveredCalcDelayDays)));
//create delivered records
if (bounce != null)
{
//there was a bounce! don't add a delivered record!
}
else
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
//remove bounce, it only applies to one send!
bounced.Remove(bounce);
}
if (bounced.Count() == 0)
{
break; //no more bounces to match!
}
}
因此,我做了一些测试,每秒处理大约12条发送记录。 记录超过1M,将需要25多个小时才能处理!
两个问题:
谢谢!
编辑
---理念---
我会很有信心地说,是的,您的发现正在花费时间。
看起来您确定find方法将仅返回0或1条记录(而不是列表),在这种情况下,加快速度的方法是创建查找(字典),而不是创建List<event_data>
您的退回Dictionary<key, event_data>
,创建一个Dictionary<key, event_data>
,然后您就可以通过键查找值,而不用进行查找。
诀窍在于创建密钥(我对您的应用程序知之甚少,无法帮助您),但是实质上却与您找到的条件相同。
编辑。 (添加一些伪代码)
void Main()
{
var hugeListOfEmails = GetHugeListOfEmails();
var allBouncedEmails = GetAllBouncedEmails();
IDictionary<string, EmailInfo> CreateLookupOfBouncedEmails = CreateLookupOfBouncedEmails(allBouncedEmails);
foreach(var info in hugeListOfEmails)
{
if(CreateLookupOfBouncedEmails.ContainsKey(info.emailAddress))
{
// Email is bounced;
}
else
{
// Email is not bounced
}
}
}
public IEnumerable<EmailInfo> GetHugeListOfEmails()
{
yield break;
}
public IEnumerable<EmailInfo> GetAllBouncedEmails()
{
yield break;
}
public IDictionary<string, EmailInfo> CreateLookupOfBouncedEmails(IEnumerable<EmailInfo> emailList)
{
var result = new Dictionary<string, EmailInfo>();
foreach(var e in emailList)
{
if(!result.ContainsKey(e.emailAddress))
{
if(//satisfies the date conditions)
{
result.Add(e.emailAddress, e);
}
}
}
return result;
}
public class EmailInfo
{
public string emailAddress { get; set; }
public DateTime DateSent { get; set; }
}
您应该通过使用ToLookup
方法为电子邮件地址创建查找表来进行改进
var bouncedLookup = bounced.ToLookup(k => k.email.ToLower());
并在循环中使用它来首先通过电子邮件查找
var filteredBounced = bouncedLookup[sent_email.email.ToLower()];
// mini optimisation here
var endDate = sentEmail.event_date.AddDays(deliveredCalcDelayDays);
event_data bounce = filteredBounced.Find(item => item.event_date > sentEmail.event_date && item.event_date < endDate));
我无法编译它,但我认为应该这样做。 请尝试一下。
将退回转换为排序列表可能是一个很好的解决方案
SortedList<string,data> sl = new SortedList<string,event_data>(bounced.ToDictionary(s=>s.email,s=>s));
and to find a bounce use
sl.Select(c=>c.Key.Equals(item => item.email,StringComparison.OrdinalIgnoreCase) && ...).FirstOrDefault();
您正在列表中找到项目。 这意味着它必须遍历整个列表,因此它是一个order(n)操作。 您是否不能将已发送的电子邮件存储在字典中,密钥为您正在搜索的电子邮件地址。 跳回链接会链接回字典中的电子邮件。 查找将是固定时间,您将经历跳动,因此总体上将为(n)。 您当前的方法是顺序(n平方)
经考虑,弹跳次数相对较少,因此,
为什么不尽可能地预先优化弹跳查找,此代码为每个可能的弹跳创建一个委托,并将它们分组到字典中,以通过电子邮件密钥进行访问。
private static DateInRange(
DateTime sendDate,
DateTime bouncedDate,
int deliveredCalcDelayDays)
{
if (sentDate < bouncedDate)
{
return false;
}
return sentDate < bouncedDate.AddDays(deliveredCalcDelayDays);
}
static IEnumerable<event_data> GetDeliveredMails(
IEnumerable<event_data> sent,
IEnumerable<event_data> bounced,
int siteId,
int mlId,
int mId,
int deliveredCalcDelayDays)
{
var grouped = bounced.GroupBy(
b => b.email.ToLowerInvariant());
var lookup = grouped.ToDictionary(
g => g.Key,
g => g.OrderBy(e => e.event_date).Select(
e => new Func<DateTime, bool>(
s => DateInRange(s, e.event_date, deliveredCalcDelayDays))).ToList());
foreach (var s in sent)
{
var key = s.email.ToLowerInvariant();
List<Func<DateTime, nool>> checks;
if (lookup.TryGetValue(key, out checks))
{
var match = checks.FirstOrDefault(c => c(s.event_date));
if (match != null)
{
checks.Remove(match);
continue;
}
}
yield return new event_data
{
.sid = siteid;
.mlid = mlid;
.mid = mid;
.email = s.email;
.event_date = s.event_date;
.event_status = "Delivered";
.event_type = "Delivered";
.id = s.id;
.number = s.number;
.laststoretransaction = s.laststoretransaction
};
}
}
如果不够快,可以尝试在查询中预编译委托。
我想指出您的代码还有另一个问题。
内存消耗。 我不知道您的机器配置,但是下面是有关代码的一些想法:
event_data
类型的1,2M +个对象分配空间。 我看不到event_data
完整的类型定义,但是假设电子邮件都是唯一的,并且看到该类型具有很多属性,我可以假设这样的集合相当繁重 (可能有数百个Meg)。 event_data
对象(如果我算对的话,接近1M)。 就内存消耗而言,它变得更加沉重 bounced.Remove(bounce);
后,您可以轻松地收集bounced.Remove(bounce);
确实会大大减慢您的应用程序的速度。 因此,即使您有足够的内存和/或您的应用程序是64位的,我也会尽量减少内存消耗。 可以肯定的是,它将使您的代码运行得更快。 例如,您可以完成deliveredEmail
event_data
完整处理,而无需存储它,或者以块等形式加载初始event_data
。
好的,我找到的最终解决方案是弹跳字典。
发送的LinkedList按send_date排序,因此将按时间顺序循环遍历。 这很重要,因为我必须将正确的发送匹配到正确的退回。
我做了一个Dictionary<string,<List<event_data>>
,所以关键是电子邮件,值是该电子邮件地址所有<event_data>
退回邮件的列表。 由于我想确保第一次跳出与发送匹配,因此该列表按event_date排序。
最终结果...从每秒处理700条记录到每秒50万条记录。
这是最终代码:
已发送LinkedList = GetMyHugeListOfSends(); IEnumerable sentOrdered = send.OrderBy(发送=> send.event_date);
字典> bounced = GetMyListOfBouncesAsDictionary();
已交付列表=新List(); event_data deliveryEmail =新的event_data();
列表跳动=空; 布尔matchedBounce =假;
foreach(sendOrdered中的event_data sentEmail){matchedBounce = false;
//create delivered records
if (bounced.TryGetValue(sentEmail.email, out bounces))
{
//there was a bounce! find out if it was within 4 days after the send!
foreach (event_data bounce in bounces)
{
if (bounce.event_date > sentEmail.event_date &&
bounce.event_date <= sentEmail.event_date.AddDays(4))
{
matchedBounce = true;
//remove the record because a bounce can only match once back to a send
bounces.Remove(bounce);
if(bounces.Count == 0) //no more bounces for this email
{
bounced.Remove(sentEmail.email);
}
break;
}
}
if (matchedBounce == false) //no matching bounces in the list!
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
}
}
else
{
//if sent is not bounced, it's delivered
deliveredEmail.sid = siteid;
deliveredEmail.mlid = mlid;
deliveredEmail.mid = mid;
deliveredEmail.email = sentEmail.email;
deliveredEmail.event_date = sentEmail.event_date;
deliveredEmail.event_status = "Delivered";
deliveredEmail.event_type = "Delivered";
deliveredEmail.id = sentEmail.id;
deliveredEmail.number = sentEmail.number;
deliveredEmail.laststoretransaction = sentEmail.laststoretransaction;
delivered.Add(deliveredEmail); //add the new delivered
deliveredEmail = new event_data();
}
if (bounced.Count() == 0)
{
break; //no more bounces to match!
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.