简体   繁体   English

动态规范化表格是否可行?

[英]Is it practical to dynamically normalize a table?

Let's say my database tracks bird sightings (Note: I'm really scraping the bottom of the barrel for examples).假设我的数据库跟踪鸟类观察(注意:我真的在刮桶底部的例子)。

The fields are:这些字段是:

sighting_id | common_name | park_name | location | time | etc....

Although I'm assuming that a park will always be in the same location, the website is like a spreadsheet.虽然我假设公园总是在同一个位置,但该网站就像一个电子表格。 The user enters park_name and location for every entry.用户为每个条目输入park_namelocation Also please note that my actual schema has other fields that are dependent on the analogous "park name" as well (eg state).另请注意,我的实际模式还有其他依赖于类似“公园名称”的字段(例如状态)。

I do not have a way for the user to predefine parks, so I can't know them ahead of time.我没有办法让用户预定义公园,所以我无法提前知道它们。 Should I even attempt to dynamically normalize this data?我是否应该尝试动态规范化这些数据? For example, should my program automatically populate a parks table, replacing the park_name and location column in the bird sighting table with a park_id ?例如,我的程序是否应该自动填充parks表,将观鸟表中的 park_name 和 location 列替换为park_id

I'm worried about performance, mostly.我主要担心性能。 Listing every sighting would require a join to populate park and location.列出每个目击事件需要一个连接来填充公园和位置。 Also, dynamically managing this would almost certainty require more resources than it would save.此外,动态管理这几乎肯定需要比它节省的更多资源。 I would probably need a Cron job to eliminate orphaned Parks, since they may be referenced in multiple sightings.我可能需要一个 Cron 工作来消除孤儿公园,因为它们可能会在多次目击中被引用。

It depends on a bit on your usage.这取决于您的使用情况。 The normalized approach (park is a table) will make the following queries easier:规范化方法(park 是一个表)将使以下查询更容易:

  • How many bird sightings have there been for each park每个公园有多少鸟类目击事件
  • At which park are you most likely to see bird XYZ你最有可能在哪个公园看到鸟 XYZ
  • There are probably quite a few more queries like this可能还有很多这样的查询

But yes, you do run into some sticky issues.但是,是的,您确实遇到了一些棘手的问题。 The pattern "if park XYZ doesn't exist then insert it into the parks table" suffers from a race condition that you'll have to deal with. “如果 park XYZ 不存在,则将其插入 parks 表中”的模式受到您必须处理的竞争条件的影响。

Now, how about some arguments against normalization here... Most customer databases probably store my street address as "123 Foo Street", without dynamically normalizing the street name (we could have a street table and put "Foo Street" there, then reference it from other tables. Why do I bring this up, well to show that even the guys who hate any repeated data will probably acknowledge that there is some line you don't necessarily have to cross.现在,这里的一些 arguments 反对规范化怎么样......大多数客户数据库可能将我的街道地址存储为“123 Foo Street”,而没有动态规范街道名称(我们可以有一个街道表并将“Foo Street”放在那里,然后参考我为什么要提出这个问题,以表明即使是讨厌任何重复数据的人也可能会承认有些线您不一定要跨越。

Another silly example would be that we might share last names.另一个愚蠢的例子是我们可能共享姓氏。 Do we really need a table for unique last names and then foreign key to it from other tables?我们真的需要一个表来存储唯一的姓氏,然后从其他表中获取外键吗? There might be some applications where this is helpful but for 99% of application out there, this goes too far.可能有一些应用程序会有所帮助,但对于 99% 的应用程序来说,这太过分了。 It's just more work and less performant for little to no gain.这只是更多的工作和更少的性能,几乎没有收益。

So I'd consider how I want to be able to query data back out of the table.所以我会考虑如何从表中查询数据。 Honestly in this case I'd probably do a separate table for parks.老实说,在这种情况下,我可能会为公园制作一张单独的桌子。 But in other cases I've chosen not to.但在其他情况下,我选择不这样做。

That's my two cents, one cent after taxes.那是我的两分钱,税后一分钱。

My two cents on the original "parks" example (as opposed to the OP's actual problem):我对原始“公园”示例的两分钱(与 OP 的实际问题相反):

The decisive argument against trying to automatically normalize the park and location columns is usability : when data is presented to the user in an editable spreadsheet-like format, they will naturally assume that each row can be independently edited, so it's deceptive (and likely to lead ultimately to confusion) if some columns such as "location" are actually associated with the park, rather than the row.反对自动规范化 park 和 location 列的决定性论据是可用性:当数据以类似电子表格的可编辑格式呈现给用户时,他们自然会假设每一行都可以独立编辑,因此它具有欺骗性(并且可能最终导致混乱)如果某些列(例如“位置”)实际上与公园相关联,而不是与行相关联。

A typical pattern for handling this sort of situation is to only prompt the user for park's details and create a row in the "parks" table when a new park is entered.处理这种情况的典型模式是仅提示用户输入公园的详细信息,并在输入公园时在“公园”表中创建一行。 For example, if the park column contains a drop-down box, then the last option could be "add new park".例如,如果公园列包含一个下拉框,那么最后一个选项可以是“添加新公园”。 Alternatively, add a new park when the user enters an unrecognized park name -- but still make it clear to the user that a new park is being created.或者,当用户输入一个无法识别的公园名称时添加一个新公园——但仍要让用户清楚地知道正在创建一个新公园。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM