![](/img/trans.png)
[英]Linq query and Foreach on large number of records from SQL database
[英]How to optimize Linq query with large number of records?
請幫助我優化以下代碼。 我嘗試了不同的方法,但沒有獲得顯着的性能改進。 數據庫中大約有 30k 個條目,在本地加載大約需要 1 分鍾。
var alarms = from healthIssue in _context.HealthIssues.AsNoTracking()
join asset in _context.Assets.AsNoTracking() on healthIssue.AssetNumber equals asset.SerialNumber into joinedTable
from data in joinedTable.DefaultIfEmpty()
select new
{
ID = healthIssue.ID,
AssetNumber = healthIssue.AssetNumber,
AlarmName = healthIssue.AlarmName,
Crew = data.Crew,
};
//alarmsViewModelList count is 30k
var alarmsViewModelList = await alarms.ToListAsync();
//groupedData count = 12k
var groupedData = alarmsViewModelList.Select(c => new { c.AssetNumber,c.AlarmName}).Distinct().ToList();
// filteralarms' count = 20k
var filteralarms = (alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).ToList());
for (int j = 0; j < groupedData.Count; j++)
{
var alarm = groupedData[j];
//The line is actually slowing the code.
var alarmlist = filteralarms.AsEnumerable().Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber
).Select
(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c =>c.AlarmLastUpdateDateTime).ToList();
int alarmCount = alarmlist.Count;
if (alarmCount > 1)
{
businessLogicFunction(alarmlist);
}
}
這就是我可以用 linq 做的。
//alarmsViewModelList count is 30k
var alarmsViewModelList = await alarms.ToListAsync();
//groupedData is almost 12k
var groupedData = alarmsViewModelList.Select(c => new { c.AssetNumber,c.AlarmName}).Distinct().ToList();
// filteralarms' count is almost 20k
var filteralarms = alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).OrderByDescending(c => DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdateDateTime));
for (int j = 0; j < groupedData.Count; j++)
{
var alarm = groupedData[j];
//The line is actually slowing the code.
var alarmlist = filteralarms.Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber);
if (alarmlist.Count() > 1)
{
businessLogicFunction(alarmlist.Select
(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).ToList());
}
filteralarms = filteralarms.Where(c => c.AlarmName != alarm.AlarmName || c.AssetNumber != alarm.AssetNumber).ToList();
}
我認為以上代碼為 O(2n)。 如果可以的話,您可以通過刪除 businessLogicFunction 中的 ToList() 來使其更快。
businessLogicFunction(alarmlist.Select
(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}));
改變它所以不要以更快的方式使用跳過 insted 索引更快的方法是排序列表並跳過其余部分,如下所示:
//alarmsViewModelList count is 30k
var alarmsViewModelList = alarms.ToList();
// here the groupedData list look like this {(1,1),(2,1),(3,1),(4,1),(5,1),(6,1)}. because the list is orderd by assetNumber then by alarmName
var groupedData = alarmsViewModelList.Select(c => new { c.AssetNumber, c.AlarmName }).Distinct().OrderBy(c => c.AssetNumber ).ThenBy(c => c.AlarmName).ToList();
// here the filteralarms list look like this {(1,1), (1,1) (1,1), (2,1),(2,1),(3,1),(3,1),(3,1),(4,1)...}
var filteralarms = alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).OrderBy(c => c.AssetNumber).ThenBy(c => c.AlarmName).AsEnumerable();
int k = 0;
for (int j = 0; j < groupedData.Count; j++)
{
var alarm = groupedData[j];
//The line is actually slowing the code.
var alarmlist = new List<Alarm>();
for(; k<filteralarms.Count();k++)
{
if (filteralarms[k].AlarmName == alarm.AlarmName && filteralarms[k].AssetNumber == alarm.AssetNumber)
{
alarmlist.Add(filteralarms[k]);
}
else
{
break;
}
}
if (alarmlist.Count() > 1)
{
businessLogicFunction(alarmlist.Select
(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = c.AlarmLastUpdatedTime,
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c => c.AlarmLastUpdateDateTime).ToList());
}
我認為上面的代碼是 O(n)。
您正在創建兩個派生自alarmsViewModelList
的列表:
groupedData
是{ alarm.AssetNumber, alarm.AlarmName }
的不同值filteralarms
都是AlarmSeverityLevel.= AlarmSeverityLevel.Unknown
的警報。 創建這兩個列表后,您循環遍歷第一個列表並嘗試通過線性搜索將其與第二個列表中的值交叉引用。 這是一個 n 平方運算。 但是由於這兩個列表最初是從相同的源數據alarmsViewModelList
創建的,因此您可以使用Enumerable.GroupBy()
而不是Distinct()
來維護每個分組鍵的原始對象列表。 這樣做應該完全消除對 n 平方交叉引用的需要。
此外,由於您只想將已知嚴重級別的警報傳遞給業務邏輯 function,因此您可以在進行分組之前預先過濾它們。 這應該會線性提高性能,具體取決於跳過的警報數量。
因此您的代碼應該類似於:
var groupedData = alarmsViewModelList
.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown)
.GroupBy(c => new { c.AssetNumber, c.AlarmName })
.Select(g => g.Select(c =>
new {
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c =>c.AlarmLastUpdateDateTime).ToList())
.Where(l => l.Count > 0);
foreach (var alarmList in groupedData)
businessLogicFunction(alarmList);
筆記:
ToListAsync()
時,您正在本地獲取所有數據,您可能想嘗試在服務器端而不是客戶端進行過濾或分組。 演示小提琴使用使用Enumerable.Range()
生成的合成數據:
用AsNoTracking試試,如果有OrderBy,OrderByDescending記錄,就在最后使用
var iq_filteralarms = alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).AsNoTracking(); /* IQueryable */
foreach (var item in alarmsViewModelList.Select(c => new
{
c.AssetNumber,
c.AlarmName
}).Distinct())
{
var iq_alarmlist = iq_filteralarms.Where(c => c.AlarmName == item.AlarmName && c.AssetNumber == item.AssetNumber).Select(c=> new {
c.ID,
c.AlarmLastUpdatedTime,
c.AlarmSeverityLevel
});
if (iq_alarmlist.Count() > 1)
{
businessLogicFunction(iq_alarmlist.AsEnumerable().Select(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c => c.AlarmLastUpdateDateTime));
}
}
一個簡單的邏輯性能提升是在您進行時刪除已解決的警報:
...
var alarm = groupedData[j];
//The line is actually slowing the code.
var matchingAlarms = filteralarms.Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber);
var alarmlist = filteralarms.Except(matchingAlarms
).Select
(c => new
{
...
如果您使用鍵作為連接的謂詞值和值作為選定的警報屬性來填充字典,您應該能夠做得更好。 這是我在手機上打的,所以請原諒小錯別字。
var alarms = from healthIssue in _context.HealthIssues.AsNoTracking()
join asset in _context.Assets.AsNoTracking() on healthIssue.AssetNumber equals asset.SerialNumber into joinedTable
from data in joinedTable.DefaultIfEmpty()
select new
{
ID = healthIssue.ID,
AssetNumber = healthIssue.AssetNumber,
AlarmName = healthIssue.AlarmName,
Crew = data.Crew,
};
//alarmsViewModelList count is 30k
var alarmsViewModelList = await alarms.ToListAsync();
//groupedData count = 12k
var groupedData = alarmsViewModelList
.Select(c => new { c.AssetNumber,c.AlarmName})
.Distinct()
.ToList();
// filteralarms' count = 20k
var filteralarms = alarmsViewModelList
.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).ToList();
//populate an in memory dictionary with a key that is the where clause predicate
var alarmDict = new Dictioanry<string, Alarm>();
foreach (var c in filterAlarms) {
var key = c.AlarmName+"|"+c.AssetNumber;
if (!alarmDict.TryGetValue(key, out var list)) {
alarmDict[key] = new List<Alarm>();
}
var alarm = new {
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel};
alarmDict[key].Add(alarm);
}
for (int j = 0; j < groupedData.Count; j++)
{
var alarm = groupedData[j];
//use dictionary for faster results. building the dictionary is now the more expensive operation
var key = alarm.AlarmName+"|"+alarm.AssetNumber;
if (alarmDict.TryGetValue(key, out var alarms)) {
var alarmlist = alarms
.OrderByDescending(c => c.AlarmLastUpdateDateTime)
.ToList();
int alarmCount = alarmlist.Count;
if (alarmCount > 1)
{
businessLogicFunction(alarmlist);
}
}
}
我會 go 在您的代碼中進行這三項優化。
var groupedData = alarmsViewModelList.GroupBy(c => new { c.AssetNumber,c.AlarmName }).ToListAsync();
var filteralarms = await alarmsViewModelList.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown).ToListAsync();
foreach (var alarm in groupedData)
var alarmlist = filteralarms.AsEnumerable().Where(c => c.AlarmName == alarm.Key.AlarmName && c.AssetNumber == alarm.Key.AssetNumber)
.Select(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c => c.AlarmLastUpdateDateTime).ToList();
您基本上按 AlarmName + AssetNumber 對數據進行分組,過濾嚴重級別為Unknown
的警報,然后在分組的批次上運行業務 function(經過微調)。 更有效的方法是這樣的:
var grouped = alarmsViewModelList
// throw away unknown, you are not using them anywhere
.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown)
// group by AssetNumber + AlarmName
.GroupBy(c => new { c.AssetNumber, c.AlarmName })
.Select(gr => new
{
gr.Key.AlarmName,
gr.Key.AssetNumber,
// convert batch of this group to the desired form
Items = gr.Select(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c => c.AlarmLastUpdateDateTime).ToList()
});
foreach (var data in grouped) {
if (data.Items.Count > 1) {
businessLogicFunction(data.Items);
}
}
我沒有看到任何優化獲取alarmsViewModelList
查詢的答案,因為您從數據庫中獲取 30K 條記錄,然后在應用程序 memory 中嘗試過濾警報,但為什么不在數據庫端過濾它們。
值得一提的是AsNoTracking()
在這里是無用的,因為您對結果使用匿名類型,而 EF 將僅跟蹤實體。
我認為 OP 沒有清除AlarmSeverityLevel
來自哪個表,但我假設它來自第一個表並且兩個表之間也存在關系。 還有一點就是,你select這個Crew
后來沒用了,為什么? 我們可以刪除它嗎?
所以語句可能是這樣的:
var alarms = Context.HealthIssues
.Where(x => x.AlarmSeverityLevel != AlarmSeverityLevel.Unknown)
.Select(x => new { HealthIssueID= x.Id, x.AssetNumber , x.AlarmName ,x.AlarmSeverityLevel ,x.AlarmLastUpdateDateTime })
.GroupBy(x => new { x.AssetNumber, x.AlarmName })
.Select(x => new
{
ItemsCount = x.Count(),
Items = x.Select(s => new
{
AlarmSeverityLevel = s.AlarmSeverityLevel,
AlarmLastUpdatedTime = s.AlarmLastUpdateDateTime,//for not generating translate error, can change later,
HealthIssueID = s.HealthIssueID
}).OrderByDescending(o => o.AlarmLastUpdatedTime)
}).ToList();
for (int i = 0; i < alarms.Count; i++)
{
var alarm = alarms[i];
//you can update AlarmLastUpdatedTime here for alarm.Items
if(alarm.ItemsCount > 1)
businessLogicFunction(alarm. Items);
}
我無法運行此查詢,如果有任何問題,請告訴我。
避免不必要地使用ToList()
和AsEnumerable()
,因為這些操作可能很昂貴。嘗試使用IQueryable
接口來過濾和排序數據,以便數據庫可以為您處理這些操作,例如:
var alarms = from healthIssue in _context.HealthIssues.AsNoTracking()
join asset in _context.Assets.AsNoTracking() on healthIssue.AssetNumber equals asset.SerialNumber into joinedTable
from data in joinedTable.DefaultIfEmpty()
select new
{
ID = healthIssue.ID,
AssetNumber = healthIssue.AssetNumber,
AlarmName = healthIssue.AlarmName,
Crew = data.Crew,
};
除了使用for loop
遍歷groupedData
,您還可以使用foreach
通常效率更高,因為它避免了開銷
如果可能,請嘗試將businessLogicFunction
調用移到循環之外,以便每個不同的alarm.AlarmName
和alarm.AssetNumber
組合只調用一次。
// filteralarms count = 20k
var filteralarms = alarms.Where(c => c.AlarmSeverityLevel != AlarmSeverityLevel.Unknown);
// groupedData count = 12k
var groupedData = filteralarms.Select(c => new { c.AssetNumber, c.AlarmName }).Distinct();
foreach (var alarm in groupedData)
{
// alarmlist count = ?
var alarmlist = filteralarms.Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber
).Select
(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c => c.AlarmLastUpdateDateTime);
if (alarmlist.Any()
{
businessLogicFunction(alarmlist);
}
}
我認為問題在於您循環遍歷 Where 子句和 ToList 12k 次 - 這會減慢您的速度。 如果你改變這個塊會怎樣:
for (int j = 0; j < groupedData.Count; j++)
{
var alarm = groupedData[j];
//The line is actually slowing the code.
var alarmlist = filteralarms.AsEnumerable().Where(c => c.AlarmName == alarm.AlarmName && c.AssetNumber == alarm.AssetNumber
).Select
(c => new
{
HealthIssueID = c.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(c.AlarmLastUpdatedTime),
AlarmSeverityLevel = c.AlarmSeverityLevel,
}).OrderByDescending(c =>c.AlarmLastUpdateDateTime).ToList();
int alarmCount = alarmlist.Count;
if (alarmCount > 1)
{
businessLogicFunction(alarmlist);
}
}
加入? 這將顯着提高您的表現。 由於您可以使用查詢語法,因此請嘗試使用看起來更自然的連接,如下所示:
var alarmlist = from g in groupedData
join f in filteralarms on g.AssetNumber equals f.AssetNumber
where g.AlarmName == f.AlarmName
select new
{
HealthIssueID = f.ID,
AlarmLastUpdateDateTime = DateTimeHelpers.FromEpochSecondsUTC(f.AlarmLastUpdatedTime),
AlarmSeverityLevel = f.AlarmSeverityLevel,
};
int alarmCount = alarmlist.Count();
if (alarmCount > 1)
{
businessLogicFunction(alarmlist.OrderByDescending(o => o.AlarmLastUpdateDateTime).ToList());
}
希望有幫助!
我要說你的問題是錯誤的問題。 編寫高效的業務邏輯來處理數據是件好事,而且高效的算法通常可以將性能提高幾個數量級。(排序列表就是一個很好的例子)
但是,如果你想讓這個 go 更快,你可能會考慮在從數據庫中提取數據時優化你的數據。 現代數據庫經過高度調整,可以非常快速地過濾和連接數據,而人們只想着這個。 如果您在表/blobs/graph/whatever 上有一些不錯的索引,您可以在數據庫查詢中包含一些子句以過濾掉不需要處理的記錄。
提取 30k 條記錄並通過網絡發送將花費大量精力(以 db 時間尺度計)。 我希望您可以在一個查詢中獲取所有這些,因為在多次拉取中檢索它會花費更長的時間。
我沒有關於您的查詢運行日志的數據,或者傳輸和反序列化數據的加載時間。 不過,我願意和你打賭,如果你過濾查詢中的死記錄並說將你的有效負載減半,你將獲得巨大的性能提升。 不要加載不需要處理的記錄。 此外,如果您可以正確過濾金屬,您的 linq 可能會變成一個可以在 o(n) 時間內處理的列表。 如果你可以調整它運行的數據,為什么要調整你的 Linq?
祝你好運。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.