简体   繁体   中英

SQL Server logging row visits best practice

I currently have a database for articles that keeps track of the most read article for a certain amount of time by incrementing the "visits" counter on page_load. The current "visits" counter is a column in the articles table (see below):

id | title  | description | visits | creation_date
---+--------+-------------+--------+-----------------
1  | test1  | test test.. | 10     | 2019-01-01
2  | test2  | test test.. | 20     | 2019-01-01

Sometimes, I experienced connection timeouts and I suspected a deadlock from the "visits" write procedure (database locks if concurrent users were incrementing the same row at once). I thought of the below scenario as an enhancement:

  1. Remove the Visits counter from the table Articles
  2. Create a new table article_visits with two columns: article_id and date

Articles

id | title | desc | creation_date
---+-------+------+---------------
1  | test1 | desd | 2019-01-01
2  | test1 | desd | 2019-01-01

article_visits

article_id | visit_date
-----------+----------------------
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
1          | 2019-01-01
2          | 2019-01-01
2          | 2019-01-01
2          | 2019-01-01

As an alternative option, once triggering a new visit, I insert a new row into the articles_visits table to avoid any deadlocks on the articles table. This solution will make the articles_visits table grow big very quickly but I don't think table size is a problem.

I would like to know if this is the proper way to log article visits and if the optimization if is a better option than the original solution.

This is a fine way to record article visits. It is much less (or not at all) prone to deadlocks, because you are basically just appending new rows.

It is more flexible. You can get the number of visits between two dates, for instance. And that can be defined at query time. You can store the exact time, so determine if there are time preferences for views.

The downside is performance on querying. If you frequently need the counts, then the calculation can be expensive.

If this is an issue, there are multiple possible approaches:

  • A process that summarizes all the data periodically (say data).
  • A process that summarizes the data on a period basis for that period (say a daily summary).
  • A materialized/indexed view which allows the database to keep the data up-to-date.

This is certainly valid, though you may want to do some scoping on how much additional storage and memory load this will require for your database server.

Additionally, I might add a full datetime or datetime2 column for the actual timestamp (in addition to the current date column rather than instead of it, since you'll want to do aggregation by date only and having that value pre-computed can improve performance), and perhaps a few other columns such as IP Address and Referrer. Then you can use this data for additional purposes, such as auditing, tracking referrer/advertiser ROI, etc.

I'm interested to understand why you are getting a dead lock. It should be the case that a db platform should be able to handle a update tablename set field = field + 1 concurrently just fine. Here the table or row will lock and then release but the time should not be long enough to cause a deadlock error.

YOU COULD get a deadlock error if you are updating or locking more than one table with a transaction accross multiple tables esp. if you do them in a different order.

So the question is... in your original code are you linking to multiple tables when you do the update statement? The solution could be as simple as making your update atomic to one table.

However, I do agree -- the table you describe is a more functional design.

Current Articles table is not in Normalized form .

I will say putting visits column in Articles table is not proper way of De-Normalization .

Current Articles table is not only giving you deadlock issue but also you cannot get so many other type of Report. Daily Visit Report, Weekly Visit Report .

Creating Article_visits table is very good move. It will be very frequently updated.

My Article_visits design

article_visit_id |   article_id | visit_date           | visit_count
-----------------+--------------+----------------------+----------------------
1                |    1         | 2019-01-01           | 6
2                |    2         | 2019-01-01           | 3

Here Article_Visit_id is int identity(1,1) which is also Clustered Index .

Create NonClustered Index NCI_Articleid_date ON Article_visits(article_id,visit_date)
GO

In short,creating CI on article_id,visit_date will expensive affair.

If record do not exists for that article on that date then insert with visit_count 1 if it exists then update visit_count ie increase by 1.

  1. It is Normalized.
  2. You can create any kind of report, current requirement+ any future requirement.
  3. You can show Article wise count.Query is so easy and Performant.
  4. You can get weekly,even getting yearly report is so easy and without Indexed View .

Actual Table Design,

Create Table Article(Articleid int identity(1,1) primary key
,title varchar(100) not null,Descriptions varchar(max) not null
 ,CreationDate Datetime2(0))
    GO

 Create Table Article_Visit(Article_VisitID int identity(1,1) primary key,Articleid int not null ,Visit_Date datetime2(0) not null,Visit_Count int not null) 
    GO

--Create Trusted FK
    ALTER TABLE Article_Visit
    WITH NOCHECK
    ADD CONSTRAINT FK_Articleid FOREIGN KEY(Articleid) 
    REFERENCES Article(Articleid) NOT FOR REPLICATION;
    GO


    --Create NonClustered Index NCI_Articleid_Date on 
    -- Article_Visit(Articleid,Visit_Date)
    --Go

    Create NonClustered Index NCI_Articleid_Date1 on 
     Article_Visit(Visit_Date)include(Articleid)
    Go

Create Trusted FK to get Index Seek Benefit (in short). I think, NCI_Articleid_Date is no more require because of Articleid being Trusted FK .

Deadlock Issue : Trusted FK was also created to overcome Deadlock issue. It often occur due to bad Application code or UN-Optimized Sql query or Bad Table Design .Beside this also there several other valid reason,like handling Race Condition .It is quite DBA thing.If deadlock is hurting too much then after addressing above reason, you may have to Isolation Level .

Many Deadlock issue are auto handle by Sql server itself.

There are so many article online on DEADLOCK REASON .

I don't think table size is a problem

Table size are big issue.Chances of Deadlock in both design are very very less.But you will always face other demerit of Big Size table.

I am telling you to read few more article.

I hope that this is your exactly same real table with same data type?

How frequently both table will inserted/updated?

Which table will be query more frequently?

Concurrent use of each table.

Deadlock can be only minimize so that there is no performance issue or transaction issue.

What is relation between Visitorid and Artcileid ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM