简体   繁体   English

SQL 服务器日志记录行访问最佳实践

[英]SQL Server logging row visits best practice

I currently have a database for articles that keeps track of the most read article for a certain amount of time by incrementing the "visits" counter on page_load.我目前有一个文章数据库,通过增加 page_load 上的“访问次数”计数器,在一定时间内跟踪阅读次数最多的文章。 The current "visits" counter is a column in the articles table (see below):当前的“访问”计数器是articles表中的一列(见下文):

id | title  | description | visits | creation_date
---+--------+-------------+--------+-----------------
1  | test1  | test test.. | 10     | 2019-01-01
2  | test2  | test test.. | 20     | 2019-01-01

Sometimes, I experienced connection timeouts and I suspected a deadlock from the "visits" write procedure (database locks if concurrent users were incrementing the same row at once).有时,我遇到了连接超时,我怀疑“访问”写入过程会出现死锁(如果并发用户一次增加同一行,则会出现数据库锁)。 I thought of the below scenario as an enhancement:我认为下面的场景是一种增强:

  1. Remove the Visits counter from the table ArticlesArticles表中删除Visits计数器
  2. Create a new table article_visits with two columns: article_id and date创建一个包含两列的新表article_visitsarticle_iddate

Articles文章

id | title | desc | creation_date
---+-------+------+---------------
1  | test1 | desd | 2019-01-01
2  | test1 | desd | 2019-01-01

article_visits article_visits

article_id | visit_date
-----------+----------------------
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
2          | 2019-01-01
2          | 2019-01-01
2          | 2019-01-01

As an alternative option, once triggering a new visit, I insert a new row into the articles_visits table to avoid any deadlocks on the articles table.作为替代选项,一旦触发新的访问,我会在articles_visits articles上出现任何死锁。 This solution will make the articles_visits table grow big very quickly but I don't think table size is a problem.此解决方案将使articles_visits表快速增长,但我认为表大小不是问题。

I would like to know if this is the proper way to log article visits and if the optimization if is a better option than the original solution.我想知道这是否是记录文章访问的正确方法,以及优化是否是比原始解决方案更好的选择。

This is a fine way to record article visits.这是记录文章访问的好方法。 It is much less (or not at all) prone to deadlocks, because you are basically just appending new rows.它更不容易(或根本不会)出现死锁,因为您基本上只是在追加新行。

It is more flexible.它更灵活。 You can get the number of visits between two dates, for instance.例如,您可以获得两个日期之间的访问次数。 And that can be defined at query time.这可以在查询时定义。 You can store the exact time, so determine if there are time preferences for views.您可以存储准确的时间,因此确定视图是否有时间偏好。

The downside is performance on querying.缺点是查询性能。 If you frequently need the counts, then the calculation can be expensive.如果您经常需要计数,那么计算可能会很昂贵。

If this is an issue, there are multiple possible approaches:如果这是一个问题,有多种可能的方法:

  • A process that summarizes all the data periodically (say data).定期汇总所有数据(例如数据)的过程。
  • A process that summarizes the data on a period basis for that period (say a daily summary).一个基于该期间的期间汇总数据的过程(例如每日汇总)。
  • A materialized/indexed view which allows the database to keep the data up-to-date.允许数据库保持数据最新的物化/索引视图。

This is certainly valid, though you may want to do some scoping on how much additional storage and memory load this will require for your database server.这当然是有效的,尽管您可能希望对数据库服务器需要多少额外存储和 memory 负载进行一些范围界定。

Additionally, I might add a full datetime or datetime2 column for the actual timestamp (in addition to the current date column rather than instead of it, since you'll want to do aggregation by date only and having that value pre-computed can improve performance), and perhaps a few other columns such as IP Address and Referrer.此外,我可能会为实际时间戳添加一个完整的datetimedatetime2列(除了当前日期列而不是代替它,因为您只想按日期进行聚合并且预先计算该值可以提高性能),也许还有其他一些列,例如 IP 地址和引荐来源。 Then you can use this data for additional purposes, such as auditing, tracking referrer/advertiser ROI, etc.然后,您可以将这些数据用于其他目的,例如审计、跟踪引荐来源/广告商投资回报率等。

I'm interested to understand why you are getting a dead lock.我有兴趣了解您为什么遇到死锁。 It should be the case that a db platform should be able to handle a update tablename set field = field + 1 concurrently just fine.应该是数据库平台应该能够同时处理update tablename set field = field + 1就好了。 Here the table or row will lock and then release but the time should not be long enough to cause a deadlock error.此处表或行将锁定然后释放,但时间不应长到足以导致死锁错误。

YOU COULD get a deadlock error if you are updating or locking more than one table with a transaction accross multiple tables esp.如果您使用跨多个表的事务更新或锁定多个表,尤其是,您可能会遇到死锁错误。 if you do them in a different order.如果您以不同的顺序执行它们。

So the question is... in your original code are you linking to multiple tables when you do the update statement?所以问题是......在您的原始代码中,您在执行更新语句时是否链接到多个表? The solution could be as simple as making your update atomic to one table.解决方案可能很简单,只需将更新原子化到一张表即可。

However, I do agree -- the table you describe is a more functional design.但是,我同意——你描述的表格是一个更实用的设计。

Current Articles table is not in Normalized form .当前Articles表不是Normalized form

I will say putting visits column in Articles table is not proper way of De-Normalization .我会说将visits列放在Articles表中不是De-Normalization的正确方法。

Current Articles table is not only giving you deadlock issue but also you cannot get so many other type of Report.当前Articles表不仅给你死锁问题,而且你不能得到这么多其他类型的报告。 Daily Visit Report, Weekly Visit Report . Daily Visit Report, Weekly Visit Report

Creating Article_visits table is very good move.创建Article_visits表是非常好的举措。 It will be very frequently updated.它将非常频繁地更新。

My Article_visits design我的Article_visits设计

article_visit_id |   article_id | visit_date           | visit_count
-----------------+--------------+----------------------+----------------------
1                |    1         | 2019-01-01           | 6
2                |    2         | 2019-01-01           | 3

Here Article_Visit_id is int identity(1,1) which is also Clustered Index .这里Article_Visit_idint identity(1,1) ,它也是Clustered Index

Create NonClustered Index NCI_Articleid_date ON Article_visits(article_id,visit_date)
GO

In short,creating CI on article_id,visit_date will expensive affair.简而言之,在article_id,visit_date上创建 CI 会很昂贵。

If record do not exists for that article on that date then insert with visit_count 1 if it exists then update visit_count ie increase by 1.如果该article在该日期不存在记录,则插入visit_count 1,如果存在则更新visit_count ,即增加 1。

  1. It is Normalized.它是标准化的。
  2. You can create any kind of report, current requirement+ any future requirement.您可以创建任何类型的报告、当前需求+任何未来需求。
  3. You can show Article wise count.Query is so easy and Performant.您可以显示文章明智计数。查询非常简单且高效。
  4. You can get weekly,even getting yearly report is so easy and without Indexed View .您可以每周获得,甚至获得年度报告都非常容易,而且没有Indexed View

Actual Table Design,实际表设计,

Create Table Article(Articleid int identity(1,1) primary key
,title varchar(100) not null,Descriptions varchar(max) not null
 ,CreationDate Datetime2(0))
    GO

 Create Table Article_Visit(Article_VisitID int identity(1,1) primary key,Articleid int not null ,Visit_Date datetime2(0) not null,Visit_Count int not null) 
    GO

--Create Trusted FK
    ALTER TABLE Article_Visit
    WITH NOCHECK
    ADD CONSTRAINT FK_Articleid FOREIGN KEY(Articleid) 
    REFERENCES Article(Articleid) NOT FOR REPLICATION;
    GO


    --Create NonClustered Index NCI_Articleid_Date on 
    -- Article_Visit(Articleid,Visit_Date)
    --Go

    Create NonClustered Index NCI_Articleid_Date1 on 
     Article_Visit(Visit_Date)include(Articleid)
    Go

Create Trusted FK to get Index Seek Benefit (in short).创建 Trusted FK 以获得 Index Seek Benefit(简而言之)。 I think, NCI_Articleid_Date is no more require because of Articleid being Trusted FK .我认为, NCI_Articleid_Date不再需要,因为ArticleidTrusted FK

Deadlock Issue : Trusted FK was also created to overcome Deadlock issue. Deadlock Issue :还创建了Trusted FK来克服死锁问题。 It often occur due to bad Application code or UN-Optimized Sql query or Bad Table Design .Beside this also there several other valid reason,like handling Race Condition .It is quite DBA thing.If deadlock is hurting too much then after addressing above reason, you may have to Isolation Level .它通常是由于错误的Application code或未UN-Optimized Sql queryBad Table Design而发生的。除此之外还有其他一些有效原因,例如处理Race Condition 。这是 DBA 的事情。如果死锁伤害太大,那么在解决上述原因之后,你可能要Isolation Level

Many Deadlock issue are auto handle by Sql server itself.许多死锁问题是由 Sql 服务器本身自动处理的。

There are so many article online on DEADLOCK REASON .网上有很多关于DEADLOCK REASON的文章。

I don't think table size is a problem我不认为桌子大小是个问题

Table size are big issue.Chances of Deadlock in both design are very very less.But you will always face other demerit of Big Size table. Table size是个大问题。两种设计中Deadlock的可能性都非常小。但是你总是会面临Big Size表的其他demerit

I am telling you to read few more article.我告诉你再读几篇文章。

I hope that this is your exactly same real table with same data type?我希望这是您具有相同数据类型的完全相同的真实表?

How frequently both table will inserted/updated?两个表的插入/更新频率如何?

Which table will be query more frequently?哪个表会被更频繁地查询?

Concurrent use of each table.并发使用每个表。

Deadlock can be only minimize so that there is no performance issue or transaction issue.死锁只能最小化,这样就不会出现性能问题或事务问题。

What is relation between Visitorid and Artcileid ? VisitoridArtcileid之间有什么关系?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM