简体   繁体   English

在T-SQL中为每个组选择最短的列

[英]Selecting the shortest column per group in T-SQL

So my table has an NVARCHAR(MAX) column where each entry (path) is prefixed with an identifier, for example, with N identifiers: 所以我的表有一个NVARCHAR(MAX)列,其中每个条目(路径)都以一个标识符为前缀,例如,带有N个标识符:

id-1\path\to\stuff
id-1\different\path
...
id-2\path\to\stuff
id-2\different\path
...
id-N\path\to\stuff
id-N\different\path
...

I'm having trouble coming up with a query to select one row per identifier, and each row would be chosen by having the shortest path in it's identifier group, including the other columns in the row. 我在查询每个标识符中选择一行时遇到麻烦,并且通过在标识符组中包含最短路径(包括该行中的其他列)来选择每一行。

So the result of the query would be N rows total, one row per identifier, with the rows being chosen on the basis of the shortest overall path length. 因此,查询的结果将是总共N行,每个标识符一行,并根据最短的总体路径长度来选择行。

I feel like I'm missing something obvious, but I'm not sure what it is. 我觉得我缺少明显的东西,但是我不确定它是什么。

You need to count the backslashes. 您需要计算反斜杠。 You can do that with len() and replace() . 您可以使用len()replace()做到这一点。 So: 所以:

select t.*
from  (select t.*, row_number() over (partition by id order by len) as seqnum
       from t cross apply
            ( values (len(path) - len(replace(path, '/', ''))) ) v(len)
      ) t
where seqnum = 1;

The cross apply is just a convenient way of naming an expression for the query. cross apply只是为查询命名表达式的一种便捷方法。 It calculates the number of slashes ("length of path") by comparing the lengths of the path with and without slashes. 它通过比较带斜线和不带斜线的路径的长度来计算斜线的数量(“路径长度”)。 The row_number() then orders the rows for eah id based on len , with the shortest len getting a value of 1 . 然后, row_number()根据len对eah id的行进行排序,最短的len的值为1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM