[英]How to correctly structure schema
I'm enrolled in DBM/BI certificate program (crash course more like) and I decided to embark on an independent project to sort of implement everything i'm learning in real time. 我参加了DBM / BI证书计划(更像是速成班),所以我决定着手进行一个独立的项目,以实时实施我正在学习的所有内容。 Long story short, Ill be analyzing data (boxofficemojo.com) about the top grossing 130 movies from the last 13 years ( using MySQL server/workbench. ).
长话短说,我将使用MySQL服务器/工作台分析有关过去13年中票房最高的130部电影的数据(boxofficemojo.com)。 First i'd like to map out a schema and then do some data mining/visualization.
首先,我想绘制一个模式,然后进行一些数据挖掘/可视化。 Here's how i've split it up so far:
到目前为止,这是我将其拆分的方式:
"Movies"
Movie_ID (Primary )
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
"Rating"
Rating_ID (P)
Rating
"Release"
Release_ID (P)
Year
Month
Day
Movie_ID (F)
"Cast"
Director_Gender (P)
Lead_Gender (P)
Director_Name
Director_Name
Movie_ID (F)
"Studio"
Studio_ID (P)
Studio_Name
and these are my relationships so far: 到目前为止,这些是我的关系:
rating to movies - one to many ( many movies can be rated R , a movie can only have 1 rating )
release to movies - one to many ( many movies can be released on the same weekend, a movie can only be released once)
cast to movies - one to many (directors/actors can make many movies, a movie can only have one cast)
studio to movies - many to many (movies can be attached to more than one studio, a studio can make more than one movie)
I know the schema is most likely not 100% correct so should i include the primary keys from all the other tables as foreign keys in the "movies" table? 我知道该模式很可能不是100%正确的,因此我应该将所有其他表的主键作为外键包含在“电影”表中吗? and how are my relationships?
以及我的关系如何?
thanks in advance 提前致谢
it looks ok for me. 我觉得还可以。
I just think the "release" entity maybe a little bit overkill (what's the use to know what movies were released at the same time?) so I think it could just be a set of movie attributes. 我只是认为“发行”实体可能有点矫kill过正(同时知道发行了哪些电影有什么用?),所以我认为它可能只是一组电影属性。
And also your "cast" entity has two directors. 您的“投射”实体也有两名董事。 Maybe you could normalize that and keep only 1 director (since movie 1<-->N director, it's just a matter of adding relationships)
也许您可以将其标准化并仅保留1个导演(由于电影1 <-> N个导演,这只是添加关系的问题)
About FKs, yes, you should add them. 关于FK,是的,您应该添加它们。 Your relationships look fine.
您的关系看起来不错。
Good luck. 祝好运。
This is related to the first answer by Leo but I'll be more specific and I add more observations. 这与Leo的第一个答案有关,但我会更加具体,并添加更多观察结果。
First, Release
attributes are functionally dependent on Movie_ID
(or Movies in general) so it should not be a separate Entity
. 首先,
Release
属性在功能上取决于Movie_ID
(或通常的Movies),因此不应将其作为单独的Entity
。
Second, and in relation to the first, you have Year
, Month
and Day
in your Release Entity why not make it as Release_Date which has Year
, Month
and Day
anyway? 其次,相对于第一个,您在发布实体中具有
Year
, Month
和Day
,为什么不将其设置为Release_Date却仍然具有Year
, Month
和Day
呢? Then you could make again your Release
attributes as part of your Movie
. 然后,您可以再次将
Release
属性作为Movie
一部分。
Third, and in relation to the first why not add a Movie_Title
field? 第三,关于第一个,为什么不添加
Movie_Title
字段?
So, in all-in-all then you could have the following schema: 因此,总的来说,您可以拥有以下架构:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
You could easily query movies that are release in a certain Year
like: 您可以轻松查询在特定
Year
发行的电影,例如:
SELECT Movie_Title, Year(Release_Date) as Release_Year
FROM Movies
WHERE Year(Release_Date) = 2011
Or you could count it also by Year
(or by Month
) 或者,您也可以按
Year
(或Month
)进行计数
SELECT Year(Release_Date) as Release_Year, COUNT(*) Number_of_Movies_in_a_Year
FROM Movies
GROUP BY Year(Release_Date)
ORDER BY Year(Release_Date)
Fourth, in your Cast
entity you said "Directors/Actors can make many movies, a movie can only have one cast". 第四,在您的
Cast
实体中,您说过“导演/演员可以拍多部电影,一部电影只能有一部演员”。 But looking at your Cast
you have a Movie
attribute which is a FK
(Foreign Key) from Movies
and that means by the way that a Movie
could have many Cast
because the FK
is always in the many side. 但是在查看
Cast
您具有Movie
属性,该属性是Movies
的FK
(外键),这意味着Movie
可以有很多Cast
因为FK
总是在很多方面。 And besides this entity is almost like a violation of the 4NF (Fourth Normal Form). 此外,这个实体几乎就像是违反4NF (第四范式)的行为。 So, the best way probably to do this is to make specialization in your
Cast
table and relate it to Movies
table so that it would have One-to-Many
relationship or a Cast
or Director
could have many movies. 因此,执行此操作的最佳方法是在
Cast
表中进行专业化处理,并将其与Movies
表关联,以使其具有One-to-Many
关系,或者Cast
或Director
可以拥有许多电影。 So, it would look like this: 因此,它看起来像这样:
"Cast"
Cast_ID (PK)
Cast_Name
Cast_Gender
Cast_Type (values here could either be Director or Lead or could be simply letters like D or L)
And your Movies
table could now be changed to like this: 现在,您的
Movies
表可以更改为:
"Movies"
Movie_ID (Primary )
Movie_Title
Dom_Revenue
Int_Revenue
OpWe_Revenue
Budget
Release_Date
Lead_ID (FK)
Cast_ID (FK)
Lastly, you said "movies can be attached to more than one studio, a studio can make more than one movie". 最后,您说过:“电影可以连接到多个工作室,一个工作室可以制作多部电影”。 A
Many-to-many
relationship usually has a bridge table
to create the many-to-many
relationship between entities. Many-to-many
关系通常具有一个bridge table
用于在实体之间创建many-to-many
关系。 So, let's say you have a Studio_Movie
entity/table as your bridge table then you will have like this: 因此,假设您有一个
Studio_Movie
实体/表作为桥表,那么您将像这样:
"Studio_Movie"
Studio_ID (PK, FK1)
Movie_ID (PK, FK2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.