简体   繁体   中英

Incremental MySQL database design where future needs are unknown

I am using MySQL, InnoDB, and running it on Ubuntu 13.04.

My general question is: If I don't know how my database is going to evolve or what my needs will eventually be, should I not worry about redundancy and relationships now?

Here is my situation:

I'm currently building a baseball database from scratch, but I am unsure how I should proceed. Right now, I'm approaching the design in a modular fashion. For example, I am currently writing a python script to parse the XML feed of a sports betting website which tells me the money line and the over/under. Since I need to start recording the information, I am wondering if I should just go ahead and populate the tables and worry about keys and such later.

So for example, my python sports odds scraping script would populate three tables (Game,Money Line, Over/Under) like so:

DateTime = Date and time of observation
          Game
+-----------+-----------+--------------+
| Home Team | Away Team | Date of Game |
+-----------+-----------+--------------+
         Money Line
+-----------+-----------+--------------+-----------+-----------+----------+
| Home Team | Away Team | Date of Game | Home Line | Away Line | DateTime |
+-----------+-----------+--------------+-----------+-----------+----------+
         Over/Under
+-----------+-----------+--------------+-----------+-----------+----------+----------+
| Home Team | Away Team | Date of Game |   Total   |    Over   |  Under   | DateTime |
+-----------+-----------+--------------+-----------+-----------+----------+----------+

I feel like I should be doing something with the redundant (home team, away team, date of game) columns of information, but I don't really know how my database is going to expand, and in what ways I will be linking everything together. I'm basically building a database so I can answer complicated questions such as:

How does weather in Detroit affect the betting lines when Justin Verlander is pitching against teams who have averaged 5 or fewer runs per game for 20 games prior to the appearance against Verlander? (As you can see, complex questions create complex relationships and queries.)

So is it alright if I go ahead and start collecting data as shown above, or is this going to create a big headache for me down the road?

The topic of future proofing a database is a large one. In general, the more successful a database is, the more likely it is to be subjected to mission creep, and therefore to have new requirements.

One very basic question is this: who will be providing the new requirements? From the way you wrote the question, it sounds like you have built the database to fit your own requirements, and you will also be inventing or discovering the new requirements down the road. If this is not true, then you need to study the evolving pattern of your client(s) needs, so as to at least guess where mission creep is likely to lead you.

Normalization is part of the answer, and this aspect has been dealt with in a prior answer. In general, a partially denormalized database is less future proofed than a fully normalized database. A denormalized database has been adapted to present needs, and the more adapted something is, the less adaptable it is. But normalization is far from the whole answer. There are other aspects of future proofing as well.

Here's what I would do. Learn the difference between analysis and design, especially with regard to databases. Learn how to use ER modeling to capture the present requirements WITHOUT including the present design. Warning: not all experts in ER modeling use it to express requirements analysis. In particular, you omit foreign keys from an analysis model because foreign keys are a feature of the solution, not a feature of the problem.

In parallel, maintain a relational model that conforms to the requirements of your ER model and also conforms to rules of normalization, and other rules of simple sound design.

When a change comes along, first see if your ER model needs to be updated. Sometimes the answer is no. If the answer is yes, first update your ER model, then update your relational model, then update your database definitions.

This is a lot of work. But it can save you a lot of work, if the new requirements are truly crucial.

Try normalizing your data (so that you do not have redundant info) like:

          Game
+---+-----------+-----------+--------------+
|ID | Home Team | Away Team | Date of Game |
+---+-----------+-----------+--------------+
         Money Line
+-----------+-----------+--------------+-----------+
| Game_ID   | Home Line | Away Line    | DateTime  |
+-----------+-----------+--------------+-----------+
         Over/Under
+-----------+-----------+--------------+-----------+-----------+
| Game_ID   |   Total   |    Over      |  Under    | DateTime  |
+-----------+-----------+--------------+-----------+-----------+

You can read more on NORMALIZATION here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM