简体繁体 English

使用 Kafka 在两个微服务之间同步数据

[英]Using Kafka for syncing data between two microservices

原文 2020-02-14 04:43:18 3 1 java/ performance/ apache-kafka/ relational-database/ data-synchronization

I am trying to use kakfa to sync data for two microservices A & B.我正在尝试使用 kakfa 来同步两个微服务 A 和 B 的数据。

A stages some data for a group of employees in a company in a database table. A在数据库表中暂存公司中一组员工的一些数据。 Then end user will trigger an event from UI to A 's backend service where it will send kakfa message(s) to a topic B is subscribed to.然后最终用户将触发从 UI 到A的后端服务的事件，它将向B订阅的主题发送 kakfa 消息。

B then takes the data either from a message or from a staged table, validates it and persist to its own database table.然后B从消息或暂存表中获取数据，对其进行验证并将其保存到自己的数据库表中。

Questions I have are..我的问题是..

Employees can range from 10 to 1000s per company, and there could be multiple companies trying to sync the data at certain time of the year.每家公司的员工数量从 10 到 1000 人不等，并且可能有多家公司在一年中的某个时间尝试同步数据。 So performance is a concern.所以性能是一个问题。 What would be a good way to divide the load?什么是分配负载的好方法？ meaning.. should I design the message to be at employee level?意思.. 我应该将信息设计为员工级别吗？ This would mean there could be thousands of messages although design wise it would be the simplest.这意味着可能有数以千计的消息，尽管在设计上这将是最简单的。 Or should it be at a company level?还是应该在公司层面？ or group of employees within a company?或公司内的一组员工？ Microservice is not doing much processing and persisting to the table.. Would it able to handle the load?微服务没有做太多的处理和持久化到表..它能够处理负载吗？ What would be the limiting factor?限制因素是什么？
Data we are handling is JSON stored in DB.我们正在处理的数据是存储在 DB 中的 JSON。 Would it be better to have a staging table and look up from B using some sort of primary key in the message?有一个临时表并使用消息中的某种主键从B查找会更好吗？ or is having all the data within a message be fine?还是让消息中的所有数据都可以？ JSON is not that big per employee data, but if aggregated to a group of employee let's say 100s, it may be 10-100 Kilobytes.每个员工的 JSON 数据并不是那么大，但如果聚合到一组员工，比如说 100 秒，它可能是 10-100 千字节。 Are we buying much from looking up data from the table?我们是否通过从表格中查找数据来购买很多东西？
We need to be able to track the status/errors, so that end user is aware of any issues and perform action to correct data and/or try resync.我们需要能够跟踪状态/错误，以便最终用户了解任何问题并采取措施纠正数据和/或尝试重新同步。 Some approach I thought of was creating a table, call it BATCH_JOB and BATCH_TASK table to keep track of request at the job level (UI event for a group of employees as mentioned which trigger the resync process) and task (employee level).我想到的一些方法是创建一个表，将其称为 BATCH_JOB和BATCH_TASK表，以跟踪作业级别的请求（如上所述，触发重新同步过程的一组员工的 UI 事件）和任务（员工级别）。 Or would there be a cleaner approach?或者会有更清洁的方法吗？

Any help/tips of design would be appreciated.任何帮助/设计提示将不胜感激。

1 个解决方案

What would be a good way to divide the load?什么是分配负载的好方法？

The short answer is using custom partitioning schemes with a reasonably large number of partitions.简短的回答是使用具有合理大量分区的自定义分区方案。 Say 100.说 100。

Or you can create a topic per company, depending on if you are using different record schemas per topic或者您可以为每个公司创建一个主题，具体取决于您是否为每个主题使用不同的记录模式

Are we buying much from looking up data from the table?我们是否通过从表格中查找数据来购买很多东西？

Well, you cannot query a topic as easily as a table, so that's the benefit... You could also use a KTable and interactive queries好吧，您无法像查询表一样轻松查询主题，这就是好处...您还可以使用 KTable 和交互式查询

Data we are handling is JSON stored in DB我们正在处理的数据是存储在 DB 中的 JSON

I assume you're not just putting one BLOB column into the database (and you've not clarified what database you're using either).我假设您不只是将一个 BLOB 列放入数据库中（并且您也没有说明您使用的是哪个数据库）。

Personally, I'd suggest you use Avro and Kafka Connect to sink topics into databases.就个人而言，我建议您使用 Avro 和 Kafka Connect 将主题下沉到数据库中。 That's the recommended solution for such a task within the Kafka APIs without introducing other projects like Spark or writing your own database code这是 Kafka API 中此类任务的推荐解决方案，而无需引入其他项目（如 Spark）或编写您自己的数据库代码

We need to be able to track the status/errors, so that end user is aware of any issues and perform action to correct data我们需要能够跟踪状态/错误，以便最终用户了解任何问题并采取措施纠正数据

Tables could work, but if you can write records to a table, you can also write events to another Kafka topic and get "notifications" from that表可以工作，但是如果您可以将记录写入表，您还可以将事件写入另一个 Kafka 主题并从中获取“通知”