[英]How to get postgres command 'nth_value' equivalent in pyspark Hive SQL for partition over?
[英]Equivalent to Oracle NTH_VALUE function
让我们有以下数据
CREATE TABLE [dbo].[LogTable] ([DateSent] [datetime] NULL)
GO
CREATE CLUSTERED INDEX [IX_LogTable_DateSent] ON [dbo].[LogTable] ([DateSent] DESC)
GO
INSERT INTO [LogTable]
SELECT TOP 500000 NULL, DATEADD(day, ( ABS(CHECKSUM(NEWID())) % 65530 ), 0)
FROM sys.sysobjects
CROSS JOIN sys.all_columns
我想在DateSent
找到每年第二低的值。 Oracle NTH_VALUE
提供了NTH_VALUE
函数,但是,SQL Server中没有这样的东西。 我创建了以下查询
SELECT YEAR(datesent),
(
SELECT datesent
FROM
(
SELECT datesent, ROW_NUMBER() OVER (ORDER BY datesent) r
FROM logtable
WHERE YEAR(datesent) = YEAR(lt.datesent)
) logtable_ranked
WHERE logtable_ranked.r = 2
) second_lowest_in_year,
(
SELECT datesent
FROM
(
SELECT datesent, ROW_NUMBER() OVER (ORDER BY datesent) r
FROM logtable
WHERE YEAR(datesent) = YEAR(lt.datesent)
) logtable_ranked
WHERE logtable_ranked.r = 3
) thirt_lowest_in_year
FROM logtable lt
GROUP BY YEAR(datesent)
会返回正确的结果,但是在我的服务器上花费的CPU时间超过7秒。 而且,该解决方案的时间随着我需要的NTH值的增加而线性增加。 在SQL Server中是否有更好(更快,也许更优雅)的方法来计算NTH_VALUE?
使用row_number()
和条件聚合:
SELECT YEAR(datesent),
MAX(CASE WHEN seqnum = 1 THEN datesent END) AS datesent_1,
MAX(CASE WHEN seqnum = 2 THEN datesent END) AS datesent_2,
MAX(CASE WHEN seqnum = 3 THEN datesent END) AS datesent_3
FROM (SELECT datesent,
ROW_NUMBER() OVER (PARTITION BY YEAR(datesent) ORDER BY datesent) AS seqnum
FROM LogTable lt
) lt
GROUP BY YEAR(datesent)
ORDER BY YEAR(datesent);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.