简体   繁体   English

我应该规范化这张桌子吗?

[英]Should I normalize this table?

I have a table Items which stores fetched book data from Amazon. 我有一个表Items,用于存储从Amazon获取的图书数据。 This Amazon data is inserted into Items as users browse the site, so any INSERT that occurs needs to be efficient. 当用户浏览站点时,此Amazon数据将插入到Items中,因此发生的任何INSERT都必须高效。

Here's the table: 这是桌子:

CREATE TABLE IF NOT EXISTS `items` (
  `Item_ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `Item_ISBN` char(13) DEFAULT NULL,
  `Title` varchar(255) NOT NULL,
  `Edition` varchar(20) DEFAULT NULL,
  `Authors` varchar(255) DEFAULT NULL,
  `Year` char(4) DEFAULT NULL,
  `Publisher` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`Item_ID`),
  UNIQUE KEY `Item_Data` (`Item_ISBN`,`Title`,`Edition`,`Authors`,`Year`,`Publisher`),
  KEY `ISBN` (`Item_ISBN`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT AUTO_INCREMENT=1 ;

Normalizing this table would presumably mean creating tables for Titles, Authors, and Publishers. 规范化此表可能意味着要为标题,作者和发布者创建表。 My concern with doing this is that the insert would become too complex.. To insert a single Item, I'd have to: 我担心的是插入会变得太复杂。要插入单个项目,我必须:

  1. Check for the Publisher in Publishers to SELECT Publisher_ID, otherwise insert it and use mysql_insert_id() to get Publisher_ID. 在发布服务器中检查发布服务器以选择Publisher_ID,否则将其插入并使用mysql_insert_id()获取Publisher_ID。
  2. Check for the Authors in Authors to SELECT Authors_ID, otherwise insert it and use mysql_insert_id() to get Authors_ID. 在“作者”中检查“作者”以选择Authors_ID,否则将其插入并使用mysql_insert_id()获得Authors_ID。
  3. Check for the Title in Titles to SELECT Title_ID, otherwise insert it and use mysql_insert_id() to get Title_ID. 检查标题中的标题以选择Title_ID,否则将其插入并使用mysql_insert_id()获得Title_ID。
  4. Use those ID's to finally insert the Item (which may in fact be a duplicate, so this whole process would have been a waste..) 使用这些ID最终插入Item(实际上可能是重复的,因此整个过程将是浪费的。)

Does that argue against normalization for this table? 这是否有悖于此表的规范化?

Note: The goal of Items is not to create a comprehensive database of books, so that a user would say "Show me all the books by Publisher X." 注意:Items的目的不是创建一个全面的书籍数据库,以便用户说“向我显示Publisher X的所有书籍”。 The Items table is just used to cache Items for my users' search results. Items表仅用于为我的用户的搜索结果缓存Items。

考虑到您的目标,我绝对不会对此进行标准化。

您已经回答了自己的问题-请勿对其进行归一化!

YES you should normalize it if you don't think it is already. 是的,如果您还不认为它已经标准化,则应该对其进行标准化。 However, as far as I can tell it's already in 5th Normal Form anyway - at least it seems to be based on the "obvious" interpretation of those column names and if you ignore the nullable columns. 但是,据我所知,它已经是第五范式了-至少它似乎是基于这些列名的“明显”解释,并且如果您忽略可为空的列。 Why do you doubt it? 你为什么会怀疑呢? Not sure why you want to allow nulls for some of those columns though. 不知道为什么要为某些列允许空值。

1.Check for the Publisher in Publishers to SELECT Publisher_ID, otherwise insert it and use mysql_insert_id() to get Publisher_ID 1.检查发布服务器中的发布服务器以选择Publisher_ID,否则将其插入并使用mysql_insert_id()获得Publisher_ID

There is no "Publisher_ID" in your table. 您的表中没有“ Publisher_ID”。 Normalization has nothing to do with inventing a new "Publisher_ID" attribute. 规范化与发明新的“ Publisher_ID”属性无关。 Substituting a "Publisher_ID" in place of Publisher certainly wouldn't make it any more normalized than it already is. 当然,用“ Publisher_ID”代替Publisher肯定不会使其标准化得多。

The only place where i can see normalization useful in your case is if you want to store information about each author. 我可以看到规范化对您有用的唯一地方是您是否要存储有关每个作者的信息。

However - Where normalization could help you - Saving space! 但是-规范化可以为您提供帮助-节省空间! Especially if there is a lot of repetition in terms of publishers, authors (that is, if you normalize individual authors table). 尤其是在发布者和作者方面有很多重复的地方(也就是说,如果您对单个作者表进行规范化)。

So if you are dealing with 10s of millions of rows, normalization will show an impact in terms of space(even performance). 因此,如果您要处理数以千万计的行,则规范化将对空间(甚至性能)产生影响。 If you don't face that situation (which i believe should be the case) you don't need to normalize. 如果您不面对这种情况(我认为应该是这种情况),则无需进行标准化。

ps - Also think of the future... will there ever be a need? ps-还要考虑未来...是否会有需要? DBs are a long term infrastructure... never design them keeping the now in mind. 数据库是一个长期的基础设施……从不设计它们时要牢记当下。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM