简体   繁体   English

数据非规范化如何与微服务模式一起工作?

[英]How does data denormalization work with the Microservice Pattern?

I just read an article on Microservices and PaaS Architecture . 我刚读了一篇关于微服务和PaaS架构的文章。 In that article, about a third of the way down, the author states (under Denormalize like Crazy ): 在那篇文章中,大约三分之一的时间,作者说(在像Denzy的Denormalize下):

Refactor database schemas, and de-normalize everything, to allow complete separation and partitioning of data. 重构数据库模式,并对所有内容进行反规范化,以实现数据的完全分离和分区。 That is, do not use underlying tables that serve multiple microservices. 也就是说,不要使用提供多个微服务的基础表。 There should be no sharing of underlying tables that span multiple microservices, and no sharing of data. 不应共享跨多个微服务的基础表,也不应共享数据。 Instead, if several services need access to the same data, it should be shared via a service API (such as a published REST or a message service interface). 相反,如果多个服务需要访问相同的数据,则应通过服务API(例如已发布的REST或消息服务接口)共享它。

While this sounds great in theory, in practicality it has some serious hurdles to overcome. 虽然这在理论上听起来很棒,但在实践中它还有一些需要克服的严重障碍。 The biggest of which is that, often, databases are tightly coupled and every table has some foreign key relationship with at least one other table. 其中最大的一点是,数据库经常紧密耦合,每个表与至少一个其他表有一些外键关系。 Because of this it could be impossible to partition a database into n sub-databases controlled by n microservices. 因此它可能是不可能的分区的数据库到由n个微服务控制n个子数据库。

So I ask: Given a database that consists entirely of related tables, how does one denormalize this into smaller fragments (groups of tables) so that the fragments can be controlled by separate microservices? 所以我要问: 给定一个完全由相关表组成的数据库,如何将其归一化为较小的片段(表组),以便片段可以由单独的微服务控制?

For instance, given the following (rather small, but exemplar) database: 例如,给定以下(相当小但是示例性)数据库:

[users] table
=============
user_id
user_first_name
user_last_name
user_email

[products] table
================
product_id
product_name
product_description
product_unit_price

[orders] table
==============
order_id
order_datetime
user_id

[products_x_orders] table (for line items in the order)
=======================================================
products_x_orders_id
product_id
order_id
quantity_ordered

Don't spend too much time critiquing my design, I did this on the fly. 不要花太多时间批评我的设计,我在飞行中做了这个。 The point is that, to me, it makes logical sense to split this database into 3 microservices: 关键是,对我来说,将这个数据库分成3个微服务是合乎逻辑的:

  1. UserService - for CRUDding users in the system; UserService - 用于系统中的CRUDding用户; should ultimately manage the [users] table; 应该最终管理[users]表; and
  2. ProductService - for CRUDding products in the system; ProductService - 用于系统中的CRUDding产品; should ultimately manage the [products] table; 应该最终管理[products]表; and
  3. OrderService - for CRUDding orders in the system; OrderService - 用于系统中的CRUDding订单; should ultimately manage the [orders] and [products_x_orders] tables 应该最终管理[orders][products_x_orders]

However all of these tables have foreign key relationships with each other. 但是,所有这些表都具有彼此的外键关系。 If we denormalize them and treat them as monoliths, they lose all their semantic meaning: 如果我们将它们归一化并将它们视为整体,它们就会失去所有的语义:

[users] table
=============
user_id
user_first_name
user_last_name
user_email

[products] table
================
product_id
product_name
product_description
product_unit_price

[orders] table
==============
order_id
order_datetime

[products_x_orders] table (for line items in the order)
=======================================================
products_x_orders_id
quantity_ordered

Now there's no way to know who ordered what, in which quantity, or when. 现在没有办法知道是谁订购了什么,数量或时间。

So is this article typical academic hullabaloo, or is there a real world practicality to this denormalization approach, and if so, what does it look like (bonus points for using my example in the answer)? 那么这篇文章是典型的学术喧嚣,还是这种非规范化方法有一个真实世界的实用性,如果是这样,它看起来是什么样的(在答案中使用我的例子的奖励积分)?

This is subjective but the following solution worked for me, my team, and our DB team. 这是主观的,但以下解决方案适用于我,我的团队和我们的数据库团队。

  • At the application layer, Microservices are decomposed to semantic function. 在应用层,微服务被分解为语义功能。
    • eg a Contact service might CRUD contacts (metadata about contacts: names, phone numbers, contact info, etc.) 例如, Contact服务可能是CRUD联系人(有关联系人的元数据:姓名,电话号码,联系信息等)
    • eg a User service might CRUD users with login credentials, authorization roles, etc. 例如, User服务可能使用登录凭据,授权角色等CRUD用户。
    • eg a Payment service might CRUD payments and work under the hood with a 3rd party PCI compliant service like Stripe, etc. 例如, Payment服务可以CRUD付款并在第三方PCI兼容服务(如Stripe等)下工作。
  • At the DB layer, the tables can be organized however the devs/DBs/devops people want the tables organized 在DB层,可以组织表,但是devs / DBs / devops人们希望组织表

The problem is with cascading and service boundaries: Payments might need a User to know who is making a payment. 问题在于级联和服务边界:付款可能需要用户知道谁在付款。 Instead of modeling your services like this: 而不是像这样建模您的服务:

interface PaymentService {
    PaymentInfo makePayment(User user, Payment payment);
}

Model it like so: 这样建模:

interface PaymentService {
    PaymentInfo makePayment(Long userId, Payment payment);
}

This way, entities that belong to other microservices only are referenced inside a particular service by ID, not by object reference. 这样,属于其他微服务的实体仅通过ID在特定服务内引用 ,而不是通过对象引用引用。 This allows DB tables to have foreign keys all over the place, but at the app layer "foreign" entities (that is, entities living in other services) are available via ID. 这允许DB表在所有地方都有外键,但在app层,“外来”实体(即生活在其他服务中的实体)可通过ID获得。 This stops object cascading from growing out of control and cleanly delineates service boundaries. 这可以阻止对象级联失控,并清晰地描述服务边界。

The problem it does incur is that it requires more network calls. 它产生的问题是它需要更多的网络呼叫。 For instance, if I gave each Payment entity a User reference, I could get the user for a particular payment with a single call: 例如,如果我给每个Payment实体一个User参考,我可以通过一次通话让用户获得特定支付:

User user = paymentService.getUserForPayment(payment);

But using what I'm suggesting here, you'll need two calls: 但是使用我在这里建议的内容,你需要两个电话:

Long userId = paymentService.getPayment(payment).getUserId();
User user = userService.getUserById(userId);

This may be a deal breaker. 这可能是一个交易破坏者。 But if you're smart and implement caching, and implement well engineered microservices that respond in 50 - 100 ms each call, I have no doubt that these extra network calls can be crafted to not incur latency to the application. 但是如果你聪明并实现缓存,并实现精心设计的微服务,每次调用响应50-100毫秒,我毫不怀疑这些额外的网络调用可以精心设计, 不会给应用程序带来延迟。

It is indeed one of key problems in microservices which is quite conviniently omitted in most of articles. 这确实是微服务中的关键问题之一,在大多数文章中都很容易省略。 Fortunatelly there are solutions for this. 幸运的是有解决方案。 As a basis for discussion let's have tables which you have provided in the question. 作为讨论的基础,我们提供了您在问题中提供的表格。 在此输入图像描述 Image above shows how tables will look like in monolith. 上图显示了表格在整体中的外观。 Just few tables with joins. 只有几个表连接。


To refactor this to microservices we can use few strategies: 要将其重构为微服务,我们可以使用很少的策略:

Api Join Api加入

In this strategy foreign keys between microservices are broken and microservice exposes an endpoint which mimics this key. 在这种策略中,微服务之间的外键被破坏,微服务暴露出模仿这个密钥的端点。 For example: Product microservice will expose findProductById endpoint. 例如:Product findProductById将公开findProductById端点。 Order microservice can use this endpoint instead of join. 订单微服务可以使用此端点而不是连接。

在此输入图像描述 It has an obvious downside. 它有一个明显的缺点。 It is slower. 它比较慢。

Read only views 只读视图

In the second solution you can create copy of the table in the second database. 在第二个解决方案中,您可以在第二个数据库中创建表的副本。 Copy is read only. 复制是只读的。 Each microservice can use mutable operations on its read/write tables. 每个微服务都可以在其读/写表上使用可变操作。 When it comes to read only tables which are copied from other databases they can (obviously) use only reads 当只读取从其他数据库复制的表时,它们(显然)只能使用读取 在此输入图像描述

High performance read 高性能读取

It is possible to achieve high performance read by introducing solutions such as redis/memcached on top of read only view solution. 通过在read only view解决方案之上引入诸如redis / memcached之类的解决方案,可以实现高性能读取。 Both sides of join should be copied to flat structure optimized for reading. 连接的两侧应复制到优化用于阅读的平面结构。 You can introduce completely new stateless microservice which can be used for reading from this storage. 您可以引入全新的无状态微服务,可用于从此存储中读取。 While it seems like a lot of hassle it is worth to note that it will have higher performance than monolithic solution on top of relational database. 虽然看起来很麻烦,但值得注意的是,它将比关系数据库上的单片解决方案具有更高的性能。


There are few possible solutions. 几乎没有可能的解决方案。 Ones which are simplest in implementation have lowest performance. 最简单的实现性能最低。 High performance solutions will take few weeks to implement. 高性能解决方案需要几周时间才能实施。

I realise this is possibly not a good answer but what the heck. 我意识到这可能不是一个好的答案,但是到底是什么。 Your question was: 你的问题是:

Given a database that consists entirely of related tables, how does one denormalize this into smaller fragments (groups of tables) 给定一个完全由相关表组成的数据库,如何将其反规范为较小的片段(表组)

WRT the database design I'd say "you can't without removing foreign keys" . WRT数据库设计我会说“你不能没有删除外键”

That is, people pushing Microservices with the strict no shared DB rule are asking database designers to give up foreign keys (and they are doing that implicitly or explicitly). 也就是说,推送具有严格无共享数据库规则的微服务的人要求数据库设计者放弃外键(并且他们正在隐式或显式地执行此操作)。 When they don't explicitly state the loss of FK's it makes you wonder if they actually know and recognise the value of foreign keys (because it is frequently not mentioned at all). 当他们没有明确说明FK的丢失时,你会想知道他们是否真的知道并认识到外键的价值(因为它经常根本没有提到)。

I have seen big systems broken into groups of tables. 我看到大型系统被分成几组表。 In these cases there can be either A) no FK's allowed between the groups or B) one special group that holds "core" tables that can be referenced by FK's to tables in other groups. 在这些情况下,可以有A)组之间不允许FK或B)一个特殊组,其中包含可由FK引用到其他组中的表的“核心”表。

... but in these systems "groups of tables" is often 50+ tables so not small enough for strict compliance with microservices. ...但是在这些系统中,“表组”通常是50多个表,因此不足以严格遵守微服务。

To me the other related issue to consider with the Microservice approach to splitting the DB is the impact this has reporting, the question of how all the data is brought together for reporting and/or loading into a data warehouse. 对我而言,微服务分割数据库方法需要考虑的另一个相关问题是它报告的影响,即所有数据如何汇总在一起以便报告和/或加载到数据仓库中的问题。

Somewhat related is also the tendency to ignore built in DB replication features in favor of messaging (and how DB based replication of the core tables / DDD shared kernel) impacts the design. 有些相关的还有忽略内置数据库复制功能以支持消息传递(以及核心表/ DDD共享内核的基于数据库的复制如何)影响设计的倾向。

EDIT: (the cost of JOIN via REST calls) 编辑:(通过REST调用JOIN的成本)

When we split up the DB as suggested by microservices and remove FK's we not only lose the enforced declarative business rule (of the FK) but we also lose the ability for the DB to perform the join(s) across those boundaries. 当我们按照微服务的建议拆分数据库并删除FK时,我们不仅失去了强制声明性业务规则(FK),而且我们也失去了DB跨这些边界执行连接的能力。

In OLTP FK values are generally not "UX Friendly" and we often want to join on them. 在OLTP中,FK值通常不是“UX友好”,我们经常想加入它们。

In the example if we fetch the last 100 orders we probably don't want to show the customer id values in the UX. 在示例中,如果我们获取最后100个订单,我们可能不希望在UX中显示客户ID值。 Instead we need to make a second call to customer to get their name. 相反,我们需要再次致电客户来获取他们的名字。 However, if we also wanted the order lines we also need to make another call to the products service to show product name, sku etc rather than product id. 但是,如果我们还想要订单行,我们还需要再次调用产品服务来显示产品名称,sku等而不是产品ID。

In general we can find that when we break up the DB design in this way we need to do a lot of "JOIN via REST" calls. 一般来说,我们可以发现,当我们以这种方式分解数据库设计时,我们需要做很多“通过REST连接”调用。 So what is the relative cost of doing this? 那么这样做的相对成本是多少?

Actual Story: Example costs for 'JOIN via REST' vs DB Joins 实际故事:“通过REST加入”与数据库连接的示例成本

There are 4 microservices and they involve a lot of "JOIN via REST". 有4个微服务,它们涉及很多“通过REST加入”。 A benchmark load for these 4 services comes to ~15 minutes . 这4项服务的基准负载达到约15分钟 Those 4 microservices converted into 1 service with 4 modules against a shared DB (that allows joins) executes the same load in ~20 seconds . 这4个微服务转换为1个服务,4个模块对共享DB(允许连接)在~20秒内执行相同的负载。

This unfortunately is not a direct apples to apples comparison for DB joins vs "JOIN via REST" as in this case we also changed from a NoSQL DB to Postgres. 遗憾的是,这并不是DB连接与“JOIN via REST”的直接对比,因为在这种情况下我们也从NoSQL DB更改为Postgres。

Is it a surprise that "JOIN via REST" performs relatively poorly when compared to a DB that has a cost based optimiser etc. 与具有基于成本的优化器等的DB相比,“通过REST加入”表现相对较差,这是一个意外吗?

To some extent when we break up the DB like this we are also walking away from the 'cost based optimiser' and all that in does with query execution planning for us in favor of writing our own join logic (we are somewhat writing our own relatively unsophisticated query execution plan). 在某种程度上,当我们像这样分解数据库时,我们也放弃了“基于成本的优化器”以及为我们执行查询执行计划所做的一切,有利于编写我们自己的连接逻辑(我们在某种程度上相对编写了我们自己的不复杂的查询执行计划)。

I would see each microservice as an Object, and as like any ORM , you use those objects to pull the data and then create joins within your code and query collections, Microservices should be handled in a similar manner. 我会将每个微服务视为一个对象,就像任何ORM一样,您使用这些对象来提取数据,然后在您的代码和查询集合中创建连接,微服务应该以类似的方式处理。 The difference only here will be each Microservice shall represent one Object at a time than a complete Object Tree. 这里的区别仅在于每个微服务一次代表一个对象而不是完整的对象树。 An API layer should consume these services and model the data in a way it has to be presented or stored. API层应该使用这些服务,并以必须呈现或存储的方式对数据建模。

Making several calls back to services for each transaction will not have an impact as each service runs in a separate container and all these calles can be executed parallely. 为每个事务多次调用服务不会产生影响,因为每个服务都在一个单独的容器中运行,并且所有这些calles都可以并行执行。

@ccit-spence, I liked the approach of intersection services, but how it can be designed and consumed by other services? @ ccit-spence,我喜欢交叉服务的方法,但它是如何被其他服务设计和使用的? I believe it will create a kind of dependency for other services. 我相信它会为其他服务创造一种依赖。

Any comments please? 有什么意见吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM