简体   繁体   English

数据库问题:将简单关系表更改为非关系表?

[英]Database Question: Change Simple Relational Tables to Non-Relational?

I have a web application running over a MySQL database (in development). 我有一个运行在MySQL数据库上(正在开发中)的Web应用程序。 I'm considering to migrate my application to Google App Engine, and would like to better understand how my simple relational database model can be transformed to the non-relational approach. 我正在考虑将我的应用程序迁移到Google App Engine,并希望更好地了解如何将我的简单关系数据库模型转换为非关系方法。

I'm a long time relational database person, and I have no experience with column based DBs such as BigTable. 我是关系数据库领域的资深人士,并且对基于列的DB(例如BigTable)没有任何经验。 Just in case Google also supports small deployments of relational databases, I would like to state that my question is general and not specific to Google - I would like to understand how simple relational models can be represented in non-relational DBs. 以防万一Google还支持关系数据库的小型部署,我想指出我的问题是普遍的,而不是特定于Google的-我想了解如何在非关系数据库中表示简单的关系模型。

My database (simplified) is as follows: 我的数据库(简体)如下:

Items Table
------------

ItemID  ItemName  ItemPriority
1       "Car"     7
2       "Table"   2
3       "Desk"    7

ItemProperties Table
---------------------

ItemID  Property        Importance 
1       "Blue"          1
1       "Four Wheels"   2
1       "Sedan"         0
2       "Rectangular"   1
2       "One Leg"       1

I have many items, each with a name and ID. 我有很多物品,每个物品都有名称和ID。 Each item has multiple properties, each property has several parameters (I only stated the name and "importance" of each property, but there are more). 每个项目都有多个属性,每个属性都有几个参数(我只说了每个属性的名称和“重要性”,但还有更多)。 I have tens of millions of items, each has hundreds of properties. 我有几千万个项目,每个项目都有数百个属性。

The usage scenario: I receive an ItemName as input, look up its ID in the items table, and fetch all the properties by that id. 使用方案:我收到一个ItemName作为输入,在项表中查找其ID,然后通过该ID获取所有属性。 I then perform some analysis on the list of properties (in memory), and return a result. 然后,我对属性列表(在内存中)执行一些分析,并返回结果。

90% of the work is lookup based on a parameter, which (if I understand correctly) is the pain-point of non-relational DBs. 90%的工作是基于参数进行的查找(如果我理解正确的话),这是非关系DB的痛点。

What is the recommended approach? 推荐的方法是什么?

From someone who has been working with Non-relational db's for a while your two tables should be really easy to translate to a non-relational db. 从已经使用非关系数据库一段时间的人那里,您的两个表应该很容易转换为非关系数据库。

Take the two tables and turn them into a single object. 拿两个表并将它们变成单个对象。

Item: - Id - Name - Properties - prop1 - prop2 项目:-ID-名称-属性-prop1-prop2

Store the whole thing in your data-store columns(Big-Table),document(CouchDB),or whatever else it uses. 将整个内容存储在数据存储列(大表),文档(CouchDB)或其他任何使用的东西中。

You can look up items by any of the ids, names, or properties. 您可以通过任何ID,名称或属性来查找项目。 There are no joins which are one of the bigger pain points of non-relational dbs. 没有联接是非关系数据库最大的痛点之一。 Parameter lookups aren't really a pain point unless I'm not understanding what you mean by that. 除非我不明白您的意思,否则参数查找并不是真正的难题。 You may have to do multiple lookups but most times that is not an issue and it scales way better than an rdbms does. 您可能需要进行多次查找,但是大多数情况下这不是问题,并且扩展性比rdbms更好。

In your example I actually consider the non-relational model to be simpler and easier to implement and understand. 在您的示例中,我实际上认为非关系模型更容易实现和理解。

Each non-relational data store has different conventions and constraints though so it's hard give guidance in the general sense. 每个非关系数据存储区都有不同的约定和约束,因此很难在一般意义上提供指导。 CouchDB can create an index on any part of the object with it's views for instance. 例如,CouchDB可以使用其视图在对象的任何部分上创建索引。 With BigTable you may have to store multiple copies of the denormalized data to get fast indexed lookups. 使用BigTable,您可能必须存储非规范化数据的多个副本才能获得快速索引的查找。 Others will have different things to consider when you decide how to store the data. 当您决定如何存储数据时,其他人将有不同的考虑事项。 There is quite a lot of differentiation out there once you leave the world of SQL. 一旦您离开了SQL领域,就会有很多不同之处。

GQL does not support joins. GQL不支持联接。 You can work around this in two ways: 您可以通过两种方法解决此问题:

  • Do the join yourself 自己加入

Just fetch the Item, check its ItemID, and query for ItemProperties with that ItemID. 只需获取Item,检查其ItemID,然后使用该ItemID查询ItemProperties。 Your tables would look exactly like you specified them. 您的表看起来与您指定的表完全相同。 Sure, this is two queries, but the two queries are simple. 当然,这是两个查询,但是两个查询很简单。

  • Use Expando Models 使用Expando模型

In an Expando model, you can create new fields at runtime. 在Expando模型中,您可以在运行时创建新字段。 They will not be indexed, so if you want to search on them it may be slower, but simply fetching them is just fine. 它们不会被索引,因此,如果您要搜索它们可能会比较慢,但是仅获取它们就可以了。 You can use complex types like ListProperty, too. 您也可以使用复杂的类型,如ListProperty。 With this sort of flexibility, you may be able to think of a way to put everything in the ItemProperties table into the Items table, and save yourself a query. 有了这种灵活性,您也许可以想到一种将ItemProperties表中的所有内容放入Items表中并为自己保存查询的方法。 Be creative. 要有创造力。

I have a very similar database structure (our "records" and "recordEntries" tables mirror your "items" and "itemProperties") and am considering a similar migration to a non-relational database. 我有一个非常相似的数据库结构(我们的“记录”和“ recordEntries”表反映了您的“项”和“ itemProperties”),并且正在考虑向非关系数据库进行类似的迁移。 We'll probably go to CouchDB or memcachedb or something like that, rather than Google. 我们可能会去CouchDB或memcachedb或类似的东西,而不是Google。

Like you I have no experience working with non-relational databases (nor do my developers). 像您一样,我也没有使用非关系数据库的经验(我的开发人员也没有)。 However, we have tossed a couple of ideas around. 但是,我们已经提出了一些想法。 Our current thoughts are (using your schema): 我们当前的想法是(使用您的模式):

  • First: Collapse each item plus its item properties into one object with fields (essentially an XML document) and stuff it into the database keyed by identifier. 第一:将每个项目及其项目属性折叠到带有字段的对象(本质上是XML文档)中,然后将其填充到以标识符为键的数据库中。 Every time you retrieve an item you get back all the itemProperties too. 每次检索项目时,您也将获得所有itemProperties。

Note the difference we have is that we index our content outside the database (with Solr), and therefore don't need to do lookups on the database itself using the "name" property, so YMMV. 请注意,我们之间的区别在于,我们在数据库外部(使用Solr)对内容进行了索引,因此不需要使用“ name”属性(即YMMV)对数据库本身进行查找。

  • Second: We're making a list off all the "relational" operations we're doing that can't be supported by the model above. 第二:我们正在列出我们正在执行的所有“关系”操作的列表,上面的模型无法支持该操作。 This includes a couple of "grouping" operations where we query items based on a special field in the item table, and a query where we try to detect all the items that have been recently modified (previously accomplished by a query on a date column in the item table). 这包括几个“分组”操作,在这些操作中,我们根据项目表中的特殊字段查询项目,在查询中,我们尝试检测所有最近被修改过的项目(以前是通过对日期列中的查询完成的)项目表)。 We're inventing alternative implementations for each of these cases (there are only a few, luckily). 我们正在为每种情况发明替代的实现方式(幸运的是,只有少数几种)。

If this proves too hard, we'll try the same exercise with another model. 如果这太困难了,我们将在另一个模型上尝试相同的练习。 Luckily, we have time to plan. 幸运的是,我们有时间计划。

One key point for us is that we're doing all our indexing externally with Solr, so (for example) we don't need to do database lookups on values in the itemProperties values, or to do lookups by name on the item table. 对我们来说,一个关键点是,我们正在使用Solr在外部进行所有索引编制,因此(例如)我们不需要对itemProperties值中的值进行数据库查找,也不需要按名称对item表进行查找。

Anyway, that's probably not much help, but I'll be keen to see what sorts of solutions more experienced people can come up with. 无论如何,这可能并没有太大帮助,但是我很想知道更有经验的人可以提出什么样的解决方案。

PS: I infer your properties table must have billions of rows. PS:我推断您的属性表必须包含数十亿行。 How many exactly, and what hardware are you running the MySQL server on? 到底要运行多少个,以及在哪个硬件上运行MySQL服务器? Are you having scalability problems yet with MySQL? 您是否在使用MySQL时遇到可伸缩性问题?

You need to flatten it all up, I think AppEngine allows structures like 您需要将所有内容弄平,我认为AppEngine允许像

ID=1, ItemName=Car, ItemPriority=7, Property=(Blue,1),Property=(Four Wheels,2),Property=(Sedan,0) ID=2, ItemName=Table, ItemPriority=2, Property=(Rectangular,1),Property=(One Leg,1) ID=3, ItemName=Desk, ItemPriority=7 ID = 1,ItemName =汽车,ItemPriority = 7,属性=(蓝色,1),属性=(四个轮子,2),属性=(轿车,0)ID = 2,ItemName =表,ItemPriority = 2,属性= (矩形,1),属性=(一条腿,1)ID = 3,ItemName =服务台,ItemPriority = 7

Notice that the same "field" could have multiple values, and that you could use multiple items in it. 请注意,同一“字段”可以有多个值,并且您可以在其中使用多个项目。

Your sample data would be 3 rows in one table. 您的样本数据将在一个表中为3行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM