简体   繁体   English

如何有效地获得每个值的最近插入的记录

[英]How to get recently inserted record of each value efficiently

I have a table TableA , and the data will be like below: 我有一个表TableA ,数据将如下所示:

PostID   PostComments   PostTransDate                    UserID
-----------------------------------------------------------------
10000    VRDFHFGFTR     2013-10-26 21:08:19.817          43434
10000    GFDGDFSDFF     2013-10-26 21:12:32.323          67576
10000    HGFHGFBNBF     2013-10-26 21:43:43.545          3232
10000    JNFNGHFGHG     2013-10-26 21:45:46.656          768
10000    MJHJNGJHGH     2013-10-26 21:56:32.767          9897
10001    XCVGFDGDFG     2013-10-26 22:54:54.868          3424
10001    YTUGFGHHGF     2013-10-26 13:32:54.132          12313
10001    HGFHFGHGHF     2013-10-26 18:08:32.878          6565

Here, I want to get UserID,PostComments of each PostID of maximum PostTransDate value. 在这里,我想获取最大PostTransDate值的每个PostIDUserID,PostComments

Output required: 需要的输出:

--------------------------------------------------
PostID   PostComments   PostTransDate                    UserID
-----------------------------------------------------------------
10000    MJHJNGJHGH     2013-10-26 21:56:32.767          9897
10001    XCVGFDGDFG     2013-10-26 22:54:54.868          3424

I already have queries for getting this. 我已经有疑问要得到这个。

Query 1: 查询1:

SELECT  TT.PostID,TT.PostComments,TT.UserID, TT.PostTransDate
FROM tableA TT WITH(NOLOCK) 
INNER JOIN
(
    SELECT PostID,MAX(PostTransDate)  PostTransDate
    FROM tableA T WITH(NOLOCK)
    GROUP BY PostID 
) T ON T.PostID = TT.PostID AND T.PostTransDate = TT.PostTransDate 

Query 2: 查询2:

SELECT *
FROM
(
SELECT PostID,UserID,PostTransDate,T.PostComments,
        ROW_NUMBER() OVER(PARTITION BY PostID ORDER BY PostTransDate DESC) RNO
FROM tableA T

) N WHERE RNO = 1

I can't run these queries on production as these are very heavy. 我无法在生产环境中运行这些查询,因为这些查询非常繁琐。 If anyone has a more simplified query than this, please post. 如果有人有比这更简化的查询,请发表。

Having no idea about your underlying index structure, and whether or not you can even change it, I would suggest this index: 不知道您的基础索引结构以及是否可以更改它,我建议使用以下索引:

CREATE INDEX x ON dbo.TableA(PostID, PostTransDate DESC) 
  INCLUDE (UserID, PostComments);

This will still require a scan to solve the existing query, but it will at least scan this index, which will be more efficient than scanning the entire table (assuming there are other columns in the table that aren't referenced by this query). 这仍然需要进行扫描以解决现有查询,但是它将至少扫描该索引,这将比扫描整个表更为有效(假设表中还有该查询未引用的其他列)。

;WITH x AS 
(
  SELECT PostID, UserID, PostTransDate, PostComments,
    rn = ROW_NUMBER() OVER (PARTITION BY PostID ORDER BY PostTransDate DESC)
  FROM dbo.TableA
)
SELECT PostID, UserID, PostTransDate, PostComments
  FROM x WHERE rn = 1;

If you want to include ties (multiple comments on the same post by different users at the same time), just change ROW_NUMBER() to DENSE_RANK() (well, actually, if you're only ever after the latest date, you could use RANK() just as effectively - I'm not sure they perform any different but it will save you 6 characters). 如果您要添加平局(不同用户同时在同ROW_NUMBER()帖子上发表多条评论),只需将ROW_NUMBER()更改为DENSE_RANK() (实际上,如果您只是在最新日期之后,则可以使用RANK()同样有效-我不确定它们执行的操作是否有所不同,但这将为您节省6个字符。 And if you don't want to include ties, you could break them predictably by adding an additional column to the order by inside OVER() - for example, if you wanted the user with the longest tenure, you could order by UserID after the descending post date. 如果你不想有关系,可以预见的由内增加一个额外的列的顺序打破他们OVER() -例如,如果你想与任职时间最长的用户,你可以通过订购UserID后降序发布日期。

Another idea, if you can't change the indexing or this doesn't provide enough boost, is to materialize the results in another table. 如果您无法更改索引或没有提供足够的提升,另一个想法是在另一个表中实现结果。 You can handle this pretty easily with a trigger, but it will affect your DML workload, so it's certainly something you shouldn't just do to fix this one query. 你可以用一个触发器很容易地处理这个问题,但它会影响您的DML的工作量,所以它肯定是你不应该只是做来解决这个查询。 It might actually make your application's performance worse. 这实际上可能会使您的应用程序的性能变差。 Of course unless you materialize all the data for this query in this table (and that would be quite redundant), it might not work so well, because in order to retrieve the data from the main table, you'll still have to join to it, and you'll still likely need a scan on the larger table to do so. 当然,除非您在此表中实现该查询的所有数据(这将是非常多余的),否则它可能无法很好地工作,因为为了从主表中检索数据,您仍然必须加入它,您仍然可能需要在较大的表上进行扫描。 If the main table has an IDENTITY column or some other primary key, that might make things both easier and more efficient, but I'm not going to start coding up solutions until I fully understand the underlying structure. 如果主表具有IDENTITY列或某些其他主键,这可能使事情变得既简单又高效,但是在我完全理解底层结构之前,我不会开始编写解决方案。

Try this out: 试试看:

SELECT ta1.* FROM tableA ta1
LEFT JOIN tableA ta2
ON ta1.postId = ta2.postId AND ta1.postTransDate < ta2.postTransDate
WHERE ta2.postTransDate IS NULL

Output: 输出:

| POSTID | POSTCOMMENTS |                  POSTTRANSDATE | USERID |
|--------|--------------|--------------------------------|--------|
|  10000 |   MJHJNGJHGH | October, 26 2013 21:56:32+0000 |   9897 |
|  10001 |   XCVGFDGDFG | October, 26 2013 22:54:54+0000 |   3424 |

FIddle here 在这里找到

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM