简体   繁体   中英

Database Design: Confused with data

I have weather data in Excel for 3 countries: UK, France and Sweden. The data is for each day through an year so that is 366 rows per country.

The unique identifier is that the date. For example, all the countries share the same date as "01/04/2012" just different information for it.

First I created 3 different tables for each country, but I figured this way I still repeat the same date and I will have to search through different tables late implementing in the web system.

Now I figured to put all the data in one table and adding id and country columns, but that generates even more data and its 1100 rows long which might slow down my system later at implementation.

Can you advise me how can I design it so each date is referenced to its own country and data but not repeating the date all the time? Maybe some foreign key ... ?

IMHO, the way for better design should be a single table with country and date as a composite key.

This will not slow down your system, as you can build index on the Country and Date columns.

From a normalized system point of view, a better idea would be to have a table Countries with an [CountryId] column and [Country Name] column.

This [CountryId] will be put in as a FK reference in your main data table.

you would have a country table

country
--------
country_id
name

and a weather table

weather
----------
observation_date
country_id
observation

A table like this

iso_country_code  observation_date  observation
--
GBR               2014-02-01        Clear
FRA               2014-02-01        Light rain
SWE               2014-02-01        Scattered clouds
GBR               2014-02-02        Snow
FRA               2014-02-02        Overcast
SWE               2014-02-02        Mostly cloudy
GBR               2014-02-03        Clear
FRA               2014-02-03        Snow
SWE               2014-02-03        Snow

has only one non-trivial functional dependency.

{iso_country_code, observation_date}->observation

The set of columns {iso_country_code, observation_date} is the only candidate key. This table is in 6NF. No guideline of normalization can raise it to a higher normal form.

An example of improvement

As it stands, the SQL statement INSERT INTO weather VALUES ('SWF', '2014-02-04', 'Snow'); will commit successfully, even though 'SWF' is clearly a typo. (No country has the ISO country code 'SWF'.)

You can increase data integrity for the column iso_country_code by creating a table of valid country codes, and setting a foreign key reference to it. If you were using SQL, for example, you might do it like this.

create table iso_country_codes (
  iso_country_code char(3) primary key,
  iso_country_name varchar(30) not null unique
);

insert into iso_country_codes values
('GBR', 'United Kingdom'),
('FRA', 'France'),
('SWE', 'Sweden');

alter table weather
add constraint 
foreign key (iso_country_code) references iso_country_codes (iso_country_code);

This has nothing to do with normalization. There are lots of aspects of database design that have nothing to do with normalization.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM