简体   繁体   English

使用像MonetDB这样的柱状数据库来避免维度建模?

[英]Use a columnar database like MonetDB to avoid dimensional modeling?

I was wondering if you think it's reasonable, using monetdb (or another columnar database) to put all of your data in one big, flat table rather than breaking it up into several related tables. 我想知道你是否认为这是合理的,使用monetdb(或另一个柱状数据库)将你的所有数据放在一个大而扁平的表中,而不是将它分成几个相关的表。

For example, a database of used cars, flat, might look like: 例如,二手车平板数据库可能看起来像:

Make    Model   Year   Color    Mileage
Chevy   Malibu  2009   orange   102100   
Chevy   Malibu  2009   orange   98112
Chevy   Malibu  2008   orange   210232
Chevy   Malibu  2009   pink     150100

Noticing the redundancy in Make-Model-Year-Color, in a SQL database or excel spreadsheet or whatever, you might have two tables like: 注意到Make-Model-Year-Color中的冗余,在SQL数据库或excel电子表格或其他任何内容中,您可能有两个表,如:

mId   Make   Model   Year  Color
1     Chevy  Malibu  2009  orange
2     Chevy  Malibu  2008  orange
3     Chevy  Malibu  2009  pink

mId   Mileage
1     102100   
1     98112
2     210232
3     150100

This helps with the redundancy at the expense of more complex queries and having to think about how to decompose (break up) the tables. 这有助于冗余,代价是更复杂的查询,并且不得不考虑如何分解(分解)表。

I was reading about columnar databases and monetdb in particular. 我正在阅读有关柱状数据库和monetdb的信息。 It seems like, since monetdb compresses columns individually that the redundancy doesn't matter and you could just use the flat table expecting same-or-better performance (query time, disk usage) as a well-decomposed set of relational tables would provide. 似乎,因为monetdb单独压缩列,冗余无关紧要,你可以使用平面表,期望相同或更好的性能(查询时间,磁盘使用)作为一组分解良好的关系表提供。 This saves design effort, but even better lets you completely automate schema design -- by avoiding it. 这节省了设计工作量,但更好的是让您完全自动化架构设计 - 避免它。

What do you think? 你怎么看? Is there some hidden cost that I'm not seeing? 是否有一些隐藏的成本,我没有看到?

Seems like you got it right. 好像你做对了。 In my experience Columnar Databases in general and MonetDB particularly deliver extremely fast query times with data structure like you have described. 根据我的经验,一般的Columnar数据库和MonetDB特别提供极快的查询时间和数据结构,就像你所描述的那样。 For the example you described, a Columnar database will Encode and Compress each column (naturally containing data of the same type, with many repetitions). 对于您描述的示例,Columnar数据库将对每列进行编码和压缩(自然包含相同类型的数据,具有多次重复)。

Anyway, if your workload include lot's of updates, benchmark the solution before deciding. 无论如何,如果您的工作量包含大量更新,请在决定之前对解决方案进行基准测试。

Personally I've seem MonetDB performs much better than most commercial Column Oriented Databases and much-much better then Row oriented or NoSQL, but the bottom line to keep in mind is that every case has it's own behavior. 就个人而言,我看起来MonetDB的性能远远超过大多数商业的面向列的数据库,而且比Row或NoSQL要好得多,但要记住的最重要的是每个案例都有自己的行为。

What you are describing is (afaik) called the "unified table approach". 您所描述的是(afaik)称为“统一表格方法”。 Very smart people tried implementing systems around this idea and gave up on it. 非常聪明的人尝试围绕这个想法实施系统并放弃它。 The latest (unsuccessful) attempt was the IBM DB2 Blink Project (read page 3 of http://homepages.cwi.nl/~idreos/BlinkDebull2012.pdf ). 最新(不成功)的尝试是IBM DB2 Blink Project(阅读http://homepages.cwi.nl/~idreos/BlinkDebull2012.pdf的第3页)。 The essence: from a query processing point of view, you will generally be better of with normalized schemas rather than having the system figure out your schema for you. 本质上:从查询处理的角度来看,使用规范化模式通常会更好,而不是让系统为您找出模式。

To answer your specific question: MonetDB does not compress data other than strings (and even those only if there are few unique strings). 要回答您的具体问题:MonetDB不压缩字符串以外的数据(甚至只有少数唯一字符串才能压缩数据)。 I'd advise you to spend the effort to define a relational schema or switch to a schemaless DBMS if you really cannot. 我建议你花费精力来定义关系模式或切换到无模式DBMS,如果你真的不能。 This will, naturally come at a performance penalty. 这自然会带来性能损失。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM