简体繁体 English

用于大数据的微服务之间的通信

[英]Communication between microservices for large data

原文 2017-06-06 16:59:50 3 1 java/ spring-integration/ spring-cloud/ microservices/ spring-cloud-stream

I am building a spring cloud-based microservice ML pipeline. 我正在构建一个基于Spring的微型服务ML管道。 I have a data ingestion service that (currently) takes in data from SQL, this data needs to be used by the prediction service. 我有一个数据提取服务（当前）从SQL接收数据，这些数据需要由预测服务使用。

The general consensus is that writes should use async message-based communication using kafka/rabbitmq. 普遍的共识是写入应使用kafka / rabbitmq使用基于消息的异步通信。

What I am not sure about is how do I orchestrate these services? 我不确定的是我如何协调这些服务？

Should I use an API gateway that invokes ingestion that starts the pipeline? 我应该使用API网关来调用启动管道的摄取吗？

1 个解决方案

Typically you would build a service with rest endpoints (Spring Boot) to ingest the data. 通常，您将使用rest端点（Spring Boot）构建一个服务来提取数据。 This service can then be deployed multiple times behind a api gateway (Zuul, Spring Cloud) that takes care about routing. 然后，可以在api网关（Zuul，Spring Cloud）后面多次部署此服务，该网关负责路由。 This is the default spring cloud microservices setup. 这是默认的spring cloud微服务设置。 The ingest service can then convert the data and produce it to a RabbitMQ or Kafka. 然后，摄取服务可以转换数据并将其生成到RabbitMQ或Kafka。 I recommend using Spring Cloud Stream for the interaction with the queue, it's abstraction on top of RabbitMQ and Kafka, which can be configured using starters/binders. 我建议使用Spring Cloud Stream进行与队列的交互，它是在RabbitMQ和Kafka之上的抽象，可以使用启动器/绑定器进行配置。

Spring Cloud Dataflow is a declarative approach for orchestration of your queues and also takes care of deployment on several cloud services/platforms. Spring Cloud Dataflow是一种用于编排队列的声明性方法，还负责在多个云服务/平台上进行部署。 This can also be used but might add extra complexity to your use case. 这也可以使用，但可能会增加您的用例的额外复杂性。