简体繁体 English

MySQL数据库设计-捕获视频使用情况指标

[英]Mysql Database design - capturing video usage metrics

原文 2018-10-31 13:21:15 6 1 mysql/ sql/ database-design

I recently got handed over this existing application. 我最近得到了这个现有的应用程序。 There is a mysql database table which is being used for tracking the time till where the user has last watched the video. 有一个mysql数据库表，用于跟踪用户最后一次观看视频的时间。

The simplistic bare version of the table is 该表的简化版本是

id -> Primary_key
user_id
video_id
last_watched_time
last_viewed_time (DateTime)

In the last_watched_time column, they store the time in seconds from where to start playing the video again. 在last_watched_time列中，它们存储以秒为单位的时间，从该时间开始再次播放视频。

The way it is done is, when the user starts playing the video, a new record is inserted in the table (if record for that user and video does not exist) and then while the user is watching the video - every 20 seconds - an update is done on that record to capture the last watched time. 这样做的方法是，当用户开始播放视频时，将在表中插入一条新记录（如果该用户的记录不存在并且视频不存在），然后在用户观看视频时（每20秒）对该记录进行更新以捕获上次观看时间。 This is so the if the user clicks on the next video without pressing the stop button - we know the time where he stopped. 如此一来，如果用户在不按下停止按钮的情况下点击下一个视频，我们就会知道他停止的时间。

So if a user watches a new video for 1 min and closes the browser, the last_watched_time column gets updated 3 times 20,40,60. 因此，如果用户观看新视频1分钟并关闭浏览器，则last_watched_time列将被更新3次20、40、60。 The last value stored is 60. So when he comes back - the video starts playing from the 1 min mark. 存储的最后一个值是60。因此，当他回来时-视频从1分钟标记开始播放。

id  |  user_id  |  video_id  |  last_watched_time  |  last_viewed_time
-------------------------------------------------------------------------
1   |     10    |     6      |       60            | 2018-10-01 10:10:10

So this is an existing table with live data. 因此，这是一个包含实时数据的现有表。

Now they want to start measuring detailed metrics of the users usage like - In last 7 days , how many hours of video the user has watched broken down by day - In last 6 hours, how many hours of video the user has watched broken down by hour 现在他们想开始衡量用户使用情况的详细指标，例如-在过去7天中，用户每天观看了多少小时的视频-在最近6小时中，用户在观看了多少小时的视频中小时

So my first thought was to do the following - add another column called view_time to this table - change every 20 sec updates to insert statements 因此，我的首要想法是执行以下操作-向该表添加另一个名为view_time的列-每隔20秒更新一次更新以插入语句

So for the same scenario above, the data in the table would be 因此，对于上述相同情况，表中的数据将是

id  |  user_id  |  video_id  |  last_watched_time  |  view_time  |  last_viewed_time
-----------------------------------------------------------------------------------------
1   |     10    |     6      |         0           |      0      |   2018-10-01 13:10:10
2   |     10    |     6      |        20           |     20      |   2018-10-01 13:10:30
3   |     10    |     6      |        40           |     20      |   2018-10-01 13:10:50
4   |     10    |     6      |        60           |     20      |   2018-10-01 13:11:10

Now if the same user comes back after 2 hours, forwards the video by 10 minutes and watches for 25 seconds 现在，如果同一用户在2小时后返回，则将视频转发10分钟并观看25秒

5   |     10    |     6      |       660           |    600      |   2018-10-01 15:11:10
6   |     10    |     6      |       680           |     20      |   2018-10-01 15:11:30

With this, I am still tracking the last_watched_time and if they want daily or hourly metrics, I can group by day or hour and sum up the view_time to know how many minutes the user watched that day or hour. 这样，我仍在跟踪last_watched_time，如果他们想要每日或每小时的指标，我可以按天或小时进行分组，并汇总view_time以了解用户当天或小时观看了多少分钟。

The obvious issue I see with this approach is if the user watches video for 4 hours - going by the 20 second insert statements to capture the last_watched_time - approximately 720 rows will be inserted into this table. 我用这种方法看到的一个明显问题是，如果用户观看视频达4个小时-经过20秒的插入语句以捕获last_watched_time-大约720行将插入到该表中。 And if we are talking about 100 users - the number just multiplies. 如果我们说的是100个用户-这个数字只会成倍增加。

Is this approach even right. 这种方法是否正确？ How should I go about it? 我应该怎么做？

1 个解决方案

Do the work on INSERT instead of on SELECT . 在INSERT而不是SELECT 。 This way much less data needs to be stored, and the SELECTs are much faster. 这样，需要存储的数据要少得多，而SELECTs则要快得多。

If all you want is the aggregate watch times, then store only that. 如果您只需要总的观看时间，则仅存储该时间。 That is, when a new record comes in, augment the existing record if it was 20 seconds ago. 也就是说，当有新记录进入时，如果它是20秒前的话，则增加现有记录。

Since you want daily and hourly watch times, whenever it is a new hour, don't add to the existing record, but start a new record. 由于您需要每天和每小时的观看时间，因此，只要是一个新的小时，就不要添加到现有记录中，而是开始一个新记录。 At the extreme, this will shrink the number of rows by 180x (60 minutes @ 20-second intervals). 在极端情况下，这会将行数减少180倍（60分钟@ 20秒间隔）。 For 'surfing', there may be no shrinkage. 对于“冲浪”，可能没有收缩。