Here is my scenario with SQLServer 2008 R2 database table
(Update: Migration to SQL Server 2014 SP1 is in progress, so SQL Server 2014 can be used here).
A. Maintain daily history in the table (which is a fact table) B. Create tableau graphs using the fact and dimension tables
A few steps to follow to create the table
a. 1st day, we get 120,000 records, sample structure is below.
(Modified or New records are highlighted in Yellow)
Source System Data:
b. 2nd day, we get, say 122,000 records (2,000 are newly inserted and 1,000 are modified/updated on previous day's data and 119,000 are as it is from previous day)
c. 3rd day, we get, say 123,000 records (1,000 are newly inserted and 1,000 are modified / updated on 2nd day's data and 121,000 are as it is from 2nd day)
for 2 weeks - 2 million rows
for 1 month - 5 million rows
for 1 year - say 65 - 70 million rows
for 12 years - say 1 billion rows (1,000 million)
What could be right strategy to store data in the table to handle this scenario, which should also provide sufficient performance while generating reports ?
Fact Table Approaches:
Tableau graphs have to created using the fact and dimension tables for scenarios like
Weekly Bar graph for Sample Count
Weekly (week no. on X-axis) plotter graph for average Sample values (on Y-axis)
Weekly (week no. on x-axis) average sample values (on Y-axis) by quality
How to handle this scenario ?
Please provide references on the approach to follow.
Should we create any indexes on the fact table ?
A data warehouse can handle millions of rows these days without a lot of difficulty. Many have tens of billions of rows, and then things get a little difficult. You should look at both table partitioning over time and at columnstore compression and page compression in terms of seeing what is out there. Large warehouses often use both. 2008 R2 is quite old at this point, and note that huge progress has been made in this area in current versions of SQL Server.
Use a standard fact-dimensional design, and try to avoid tweaking the actual schema with workarounds just to conserve space - that generally will bite you in the long run.
For proven, time tested designs in warehousing I like the Kimball group's patterns, eg The Data Warehouse Lifecycle Toolkit book.
There are a few different requirements in your case. Because of that, I suggest splitting the requirements according to the standard data warehouse three-tier model.
DWH model
Basically, you have three different approaches here, all with their pros and cons.
Can become cumbersome down the road. Is highly flexible if being used right. Time-to-market is long (depending on complexity). Historization can become complicated.
Has a very, very fast time-to-market. Will become extremely complicated to maintain when business rules or business structure changes. Helpful for a very small business but not in the case of businesses which want to expand their Business Intelligence infrastructure. Historization can become a mess if the star schema is the DWH main model.
Has a medium time-to-market. Is easier to understand than 3NF but can be puzzling for people used to a star schema. Automatically historized, parallelizable and very flexible for changing business needs, because the business rules are implemented downstream. Scales quickly.
Another highly flexible approach which I haven't used yet. Is in some kind the same approach as Data Vault but with some differences.
Presentation model
Now, to represent the never-touched-again data in the DWH layer, nothing fits better than Star Schema . Also, while creating the star schema, you can implement business logic.
Front end
Shouldn't matter, take the tool you like.
In your case, it would be smart to implement a DWH (using one of those models) and put the presentation model on top of it. If any problems are in the star schema, you could always re-generate it with the new changes.
NOTE: If you would use a star schema as a DWH model, you cannot re-create the star schema in the presentation layer without using some complex transformation logic to begin with.
NOTE: Also, sometimes the star schema is seen as a DWH. I don't think that this is a good use for it for any requirement which could become more complex.
EDIT
To clarify my last note, see this blog post: http://www.tobiasmaasland.de/2016/08/24/why-your-data-warehouse-is-not-a-data-warehouse/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.