简体   繁体   English

数据库设计:这是位置记录的好方法吗?

[英]Database Design: Is this a good practice for location logging?

In the diagram below you can see a simplified version of what I'm trying to do. 在下图中,您可以看到我正在尝试做的简化版本。 I have to track the location of certain items, but I also have to efficiently retrieve the latest location of any given item. 我必须跟踪某些项目的位置,但是我还必须有效地检索任何给定项目的最新位置。 The simplest way of doing this, would be to query ItemLocationLog and search for the latest date for that item, but since this table is bound to be very large, I'm wondering whether this would be efficient (I guess indexing dateTime field would help, but I don't have the experience to determine just how much). 执行此操作的最简单方法是查询ItemLocationLog并搜索该项目的最新日期,但是由于此表肯定很大,我想知道这样做是否有效(我想索引dateTime字段会有所帮助,但是我没有经验来确定多少)。

Another approach I thought about would be to add a foreign key for the log table on Item (as is shown in the diagram with the field "lastLocation"), which would always point to the latest log entry and it would thus spare me the search. 我考虑过的另一种方法是为Item上的日志表添加一个外键(如该图所示的字段“ lastLocation”所示),该外键始终指向最新的日志条目,因此可以节省搜索时间。 Yet another option would be to add a foreign key to Location on Item and update it every time a log entry is added for any given item. 另一个选择是将外键添加到“项目上的位置”,并在每次为任何给定项目添加日志条目时对其进行更新。

I'm sure that this is a common problem with a simple solution, but since I've had no experience with this, I'm skeptical about my own approaches. 我敢肯定这是一个简单解决方案的普遍问题,但是由于我对此没有经验,因此我对自己的方法持怀疑态度。 What are the best practices for this type of scenarios? 此类方案的最佳实践是什么? Is it ok to add references to the Item table in order to avoid a costly query, or is the query trivial enough that I should just obtain this information from the log table itself? 是否可以将引用添加到Item表中以避免昂贵的查询,还是查询琐碎到足以使我仅从日志表本身获取此信息?

数据库模型

As a matter of principle, only include redundancies in your model if you have measured the performance, determined the actual bottleneck and concluded the denormalization would actually help (enough to offset the risk of data corruption). 原则上,只有在测量了性能,确定了实际瓶颈并得出反规范化实际上会有所帮助的情况下,才在模型中包括冗余(足以抵消数据损坏的风险)。

Which it won't in your case, curiously enough. 奇怪的是,对于您而言,它不会。 One peculiarity of how B-Tree indexes work is that searching for MAX is essentially as fast as searching for exact value. B树索引如何工作的一个特殊之处是,搜索MAX本质上与搜索精确值一样快。 You might have a little bit of a boost from better caching if INT is smaller than DATETIME on your DBMS, but not much. 如果INT小于DBMS上的DATETIME,则更好的缓存可能会有所帮助。

Indexing is very powerful, if done right. 如果做对的话,索引功能非常强大。 And index on ItemLocationLog {idItem, dateTime} should facilitate lightning-fast SELECT MAX(dateTime) FROM ItemLocationLog WHERE idItem = ? 并且ItemLocationLog {idItem, dateTime}上的索引应该便于闪电般快速地SELECT MAX(dateTime) FROM ItemLocationLog WHERE idItem = ? .

Take a look at Use The Index, Luke! 看看使用索引吧,卢克! for a nice introduction on the topic. 对该主题进行了很好的介绍。

Don't pre-optimize for a problem that you don't know you have. 不要针对您不知道的问题进行预优化。

Start with an index on the ItemLocationLog table covering idItem . ItemLocationLog表上覆盖idItem的索引开始。 Then SELECT TOP 1 idItemLocationLog from ItemLocationLog order by idItemLocationLog DESC - assuming that your PK is an autoincrement column. 然后SELECT TOP 1 idItemLocationLog from ItemLocationLog order by idItemLocationLog DESC假设您的PK是一个自动增量列。 If this isn't fast enough, then try an index on idItem plus dateTime . 如果这还不够快,请尝试对idItemdateTime进行索引。 If that still isn't fast enough, then you could start considering drastic denormalization, like keeping the last known location reference on Item . 如果那还不够快,那么您可以开始考虑进行严重的非规范化,例如将最新的已知位置引用保留在Item

Some people are really surprised how good RDBMS is at retrieving data. 有些人真的很惊讶RDBMS在检索数据方面的出色表现。 You shouldn't be! 你不应该!

Try this first (examples are for PostgeSQL). 首先尝试一下(示例适用于PostgeSQL)。

在此处输入图片说明

-- Latest location of ItemID = 75
select
      a.ItemID
    , b.LocationID
    , ValidFrom
from Item         as a
join ItemLocation as b on b.ItemID     = a.ItemID
                      and b.ValidFrom  = (select max(x.ValidFrom) from ItemLocation as x
                                                                  where x.ItemID = a.ItemID) 
join Location     as c on b.LocationID = c.LocationID
where a.ItemID = 75 ;


-- Earliest location of ItemID = 75
select
      a.ItemID
    , b.LocationID
    , ValidFrom
from Item         as a
join ItemLocation as b on b.ItemID     = a.ItemID
                      and b.ValidFrom  = (select min(x.ValidFrom) from ItemLocation as x
                                                                  where x.ItemID = a.ItemID) 
join Location     as c on b.LocationID = c.LocationID
where a.ItemID = 75 ;

This may look scary, but is quite fast , the ItemID is part of primary keys 这可能看起来很吓人,但速度很快, ItemID是主键的一部分

在此处输入图片说明

And if you need a list of all items at any point in time 如果您需要任何时间的所有物品清单

-- Location of all items for point in time ('2012-05-01 11:00:00') 
select
      a.ItemID
    , b.LocationID
    , ValidFrom
from Item         as a
join ItemLocation as b on b.ItemID     = a.ItemID
                      and b.ValidFrom  = (select max(x.ValidFrom)
                                            from ItemLocation as x
                                           where x.ItemID = a.ItemID
                                             and x.ValidFrom <= '2012-05-01 11:00:00') 
join Location     as c on c.LocationID = b.LocationID
;

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM