简体   繁体   English

高效查询每个唯一ID的前N行

[英]Efficient query for only the first N rows for each unique ID

This is a follow-up to this question. 这是问题的后续措施。

TLDR: TLDR:

The question: 问题:

I want to filter a query to only keep the first n rows for each unique ID. 我想过滤查询以仅保留每个唯一ID的前n行。

The answer: 答案:

query = query.GroupBy(q => q.ID).SelectMany(g => g.Take(n));

The problem with this answer is that for 80,000+ rows, evaluating the query takes much longer than filtering by iteration ( foreach ) (at least twice as slow). 这个答案的问题在于,对于80,000多个行,评估查询所需的时间比通过迭代进行过滤( foreach )的时间要长得多(至少慢两倍)。 Looking at the SQL generated by this answer, a CROSS APPLY is used, most likely for the SelectMany() . 查看此答案生成的SQL,使用CROSS APPLY ,最有可能用于SelectMany()

This link describes what CROSS APPLY does: 该链接描述了CROSS APPLY作用:

The APPLY operator allows you to join two table expressions; APPLY运算符允许您联接两个表表达式; the right table expression is processed every time for each row from the left table expression. 每次对左表表达式中的每一行都处理右表表达式。

In short, I'm looking for a filtering query which efficiently gathers the top N rows for each unique ID . 简而言之,我正在寻找一个过滤查询,该查询可以有效地收集每个唯一ID的前N行。

A Linq solution with explained SQL would be ideal. 具有说明性SQL的Linq解决方案将是理想的选择。

I found my answer in SQL here (SQL 2000 Solution at the bottom) and managed to implement a Queryable/Linq version: 我在这里的 SQL中找到了答案(底部是SQL 2000解决方案),并设法实现了Queryable / Linq版本:

query = tableQueryable.Where(a =>
          tableQueryable.Where(b => b.ID == a.ID)
            .OrderByDescending(o => o.Timestamp)
            .Take(N)
            .Select(s => s.PK)
          .Contains(a.PK)
        ).OrderByDescending(d => d.Timestamp);

A fairly standard "sub-query" pattern. 一个相当标准的“子查询”模式。 It's much faster on a large table. 在大桌子上,速度要快得多。

L2S does not have row number so Martin's trick cannot be used. L2S没有行号,因此无法使用马丁的把戏。 I have been through this problem as well and as far as I ever found out this is the optimal L2S solution (that does not use native SQL in any way). 我也曾经遇到过这个问题,据我所知,这是最佳的L2S解决方案(不以任何方式使用本机SQL)。

You can try pulling down all results into the application and doing the row number thing there. 您可以尝试将所有结果下拉到应用程序中,然后在其中执行行号操作。 This can hurt or benefit performance. 这可能会损害性能或提高性能。 Which one it is depends on the concrete case. 它是哪一个取决于具体情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM