简体   繁体   English

数据仓库或筒仓数据集市

[英]Data warehouse or silo data marts

Currently we have 12 different databases and 7 of them are dimensional. 目前我们有12个不同的数据库,其中7个是维度。 We are a non-profit knowledge based org where we have databases based on kind of disease the person has. 我们是一个非盈利的知识型组织,我们拥有基于该人患病的数据库。

eg. 例如。 our databases look like 我们的数据库看起来像

  1. HIV HIV
  2. Hepatitis C 丙型肝炎
  3. Meningitis 脑膜炎

and so on... 等等...

Each of these would have data with tables such as: 其中每个都有包含以下表格的数据:

Patient 患者

Sample( blood samples) 样本(血液样本)

location 位置

diagnosis 诊断

Gender 性别

Provider 提供商

We dont keep track on how much money was spent as we just keep track of +ve and -ve samples. 我们没有跟踪花费多少钱,因为我们只是跟踪+ ve和-ve样本。

Now, question has come into upper management that we should build a Datawarehouse from the silo Data marts. 现在,高层管理人员已经提出问题,我们应该从筒仓数据集市建立一个Datawarehouse。

But, business users have never asked a question where they would need data from two different databases. 但是,业务用户从未问过他们需要来自两个不同数据库的数据的问题。 Do we still need DW if users have not even thought about it? 如果用户甚至没想过它,我们还需要DW吗?

Some more questions which came in my mind were: 我脑海中浮现的一些问题是:

  1. What kind of granularity for each of those datamarts? 每个数据集的粒度是多少?
  2. Which dimension could act as Conformed Dimension? 哪个维度可以作为Conformed Dimension?
  3. How would the ETL flow? ETL将如何流动?
  4. Achieve the single version of truth across all the DM's? 在所有DM中实现单一版本的真相?

I am just doing an initiative to understand what could be a solution to the situation we are in. Any help is appreciated. 我只是主动了解什么可以解决我们所处的情况。任何帮助都表示赞赏。

Thanks 谢谢

One reason I can think of for building a Data Warehouse here is if you want to "archive" old data that is no longer going to be needed on a regular basis in the Data Marts. 我想在这里构建数据仓库的一个原因是,如果您想要“存档”数据集市中不再需要的旧数据。

The other reason, which is already mentioned in comments, is if there's going to be a need for Enterprise-wide reporting (maybe in the case of auditing by an external party). 另一个原因,即评论中已经提到的,是否需要企业范围的报告(可能是由外部方审核)。 You don't mention how big your Enterprise is, but I get the impression it's not huge, and so this probably isn't going to be something I would treat as a driving factor in your decision. 你没有提到你的企业有多大,但我得到的印象并不是很大,所以这可能不会成为你决定的驱动因素。

The main reason to keep all your datamarts in the same location, a data warehouse, is to be able to track the same dimensions across different datamarts. 将所有数据集保存在同一位置(数据仓库)的主要原因是能够跨不同的数据集跟踪相同的维度。

In your example I see at least the patient, provider and disease/diagnosis dimensions that could be fed by the different data sets and keep a single version of each element. 在您的示例中,我至少看到了可以由不同数据集提供的患者,提供者和疾病/诊断维度,并保留每个元素的单个版本。

Your data integration routines will need to be updated to ensure proper updates across all dimensions. 您的数据集成例程需要更新,以确保所有维度的正确更新。 Plus you will need to set up the data warehouse (which, if your data is small, a single node postgres server should be more than enough). 另外,您需要设置数据仓库(如果您的数据很小,那么单个节点的postgres服务器应该足够了)。 If those costs are acceptable given the convenience of having consistent data across all data marts and being able to cross query, then go for it. 如果这些成本是可以接受的,因为方便的是在所有数据集市中提供一致的数据并且能够交叉查询,那么就去做吧。

But, as you say, you don't see the business case anywhere. 但是,正如您所说,您在任何地方都看不到商业案例。 So, aren't you trying to fix something that isn't broken? 所以,你不是想修复一些没有破坏的东西吗? Maybe leave it as is until the need arises and then evaluate the cost/benefit ratio of such a move. 也许在需要之前保持原样,然后评估此举的成本/效益比。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM