UPDATE thanks to @usr I have got this down to ~3 seconds simply by changing
.Select(
log => log.OrderByDescending(
d => d.DateTimeUTC
).FirstOrDefault()
)
to
.Select(
log => log.OrderByDescending(
d => d.Id
).FirstOrDefault()
)
I have a database with two tables - Logs and Collectors - which I am using Entity Framework to read. There are 86 collector records and each one has 50000+ corresponding Log records.
I want to get the most recent log record for each collector which is easily done with this SQL
SELECT CollectorLogModels_1.Status, CollectorLogModels_1.NumericValue,
CollectorLogModels_1.StringValue, CollectorLogModels_1.DateTimeUTC,
CollectorSettingsModels.Target, CollectorSettingsModels.TypeName
FROM
(SELECT CollectorId, MAX(Id) AS Id
FROM CollectorLogModels GROUP BY CollectorId) AS RecentLogs
INNER JOIN CollectorLogModels AS CollectorLogModels_1
ON RecentLogs.Id = CollectorLogModels_1.Id
INNER JOIN CollectorSettingsModels
ON CollectorLogModels_1.CollectorId = CollectorSettingsModels.Id
This takes ~2 seconds to execute.
the closest I have been able to get with LINQ is the following
var logs = context.Logs.Include(co => co.Collector)
.GroupBy(
log => log.CollectorId, log => log
)
.Select(
log => log.OrderByDescending(
d => d.DateTimeUtc
).FirstOrDefault()
)
.Join(
context.Collectors,
(l => l.CollectorId),
(c => c.Id),
(l, c) => new
{
c.Target,
DateTimeUTC = l.DateTimeUtc,
l.Status,
l.StringValue,
CollectorName = c.TypeName
}
).OrderBy(
o => o.Target
).ThenBy(
o => o.CollectorName
)
;
This produces the results I want but takes ~35 seconds to execute.
This becomes the following SQL
SELECT
[Distinct1].[CollectorId] AS [CollectorId],
[Extent3].[Target] AS [Target],
[Limit1].[DateTimeUtc] AS [DateTimeUtc],
[Limit1].[Status] AS [Status],
[Limit1].[StringValue] AS [StringValue],
[Extent3].[TypeName] AS [TypeName]
FROM (SELECT DISTINCT
[Extent1].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent1] ) AS [Distinct1]
OUTER APPLY (SELECT TOP (1) [Project2].[Status] AS [Status], [Project2].[StringValue] AS [StringValue], [Project2].[DateTimeUtc] AS [DateTimeUtc], [Project2].[CollectorId] AS [CollectorId]
FROM ( SELECT
[Extent2].[Status] AS [Status],
[Extent2].[StringValue] AS [StringValue],
[Extent2].[DateTimeUtc] AS [DateTimeUtc],
[Extent2].[CollectorId] AS [CollectorId]
FROM [dbo].[CollectorLogModels] AS [Extent2]
WHERE [Distinct1].[CollectorId] = [Extent2].[CollectorId]
) AS [Project2]
ORDER BY [Project2].[DateTimeUtc] DESC ) AS [Limit1]
INNER JOIN [dbo].[CollectorSettingsModels] AS [Extent3] ON [Limit1].[CollectorId] = [Extent3].[Id]
ORDER BY [Extent3].[Target] ASC, [Extent3].[TypeName] ASC
How can I get performance closer to what is achievable with SQL alone?
In your original SQL you can select a collection DateTimeUTC from a different row than the MAX(ID). That's probably a bug. The EF does not have that problem. It's not semantically identical, it is a harder query.
If you rewrite the EF query to be structurally the same as the SQL query you'll get identical performance. I see nothing here that EF would not support.
Compute the max(id)
with EF as well and join on that.
I had the exact same issue, i solved it by adding indexes.
A query of mine would take 45 seconds to complete, i managed to get it completing in less than a second.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.