Kafka Streams表转换

Question

我在SQL Server中有一张表想要流式传输到Kafka主题，其结构如下：

(UserID, ReportID)

该表将不断更改（添加，插入，无更新的记录）

我想将其转换为这种结构并放入Elasticsearch：

{
  "UserID": 1,
  "Reports": [1, 2, 3, 4, 5, 6]
}

到目前为止，我看到的示例是日志或点击流，它们在我的情况下不起作用。

这种用例有可能吗？ 我总是可以仅查看UserID更改和查询数据库，但这似乎很幼稚，而不是最佳方法。

更新资料

import org.apache.kafka.common.serialization.Deserializer;
import org.apache.kafka.common.serialization.Serde;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.serialization.Serializer;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.*;

import java.util.ArrayList;
import java.util.Properties;

public class MyDemo {
  public static void main(String... args) {
    System.out.println("Hello KTable!");

    final Serde<Long> longSerde = Serdes.Long();

    KStreamBuilder builder = new KStreamBuilder();

    KStream<Long, Long> reportPermission = builder.stream(TOPIC);

    KTable<Long, ArrayList<Long>> result = reportPermission
        .groupByKey()
        .aggregate(
            new Initializer<ArrayList<Long>>() {
              @Override
              public ArrayList<Long> apply() {
                return null;
              }
            },
            new Aggregator<Long, Long, ArrayList<Long>>() {
              @Override
              public ArrayList<Long> apply(Long key, Long value, ArrayList<Long> aggregate) {
                aggregate.add(value);
                return aggregate;
              }
            },
            new Serde<ArrayList<Long>>() {
              @Override
              public void configure(Map<String, ?> configs, boolean isKey) {}

              @Override
              public void close() {}

              @Override
              public Serializer<ArrayList<Long>> serializer() {
                return null;
              }

              @Override
              public Deserializer<ArrayList<Long>> deserializer() {
                return null;
              }
            });

    result.to("report-aggregated-topic");

    KafkaStreams streams = new KafkaStreams(builder, createStreamProperties());
    streams.cleanUp();
    streams.start();

    Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
  }

  private static final String TOPIC = "report-permission";

  private static final Properties createStreamProperties() {
    Properties props = new Properties();

    props.put(StreamsConfig.APPLICATION_ID_CONFIG, "report-permission-app");
    props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");

    return props;
  }
}

我实际上陷入了聚合阶段，因为我无法为ArrayList<Long>写适当的SerDe（技能还不够），lambda似乎无法在聚合器上工作-它不知道agg的类型是什么：

KTable<Long, ArrayList<Long>> sample = builder.stream(TOPIC)
    .groupByKey()
    .aggregate(
        () -> new ArrayList<Long>(),
        (key, val, agg) -> agg.add(val),
        longSerde
    );

Answer 1

您可以使用Kafka的Connect API将数据从SQL Server获取到Kafka中。 我不知道用于SQL Server的任何特定连接器，但可以使用任何基于JDBC的常规连接器： https : //www.confluent.io/product/connectors/

要处理数据，您可以使用Kafka的Streams API。 您可以简单地aggregate()每个用户的所有报告。 像这样：

KTable<UserId, List<Reports>> result =
    builder.stream("topic-name")
           .groupByKey()
           // init a new empty list and
           // `add` the items to the list in the actual aggregation
           .aggregate(...);

result.to("result-topic");

查看文档以获取有关Streams API的更多详细信息： https : //docs.confluent.io/current/streams/index.html

请注意，您需要确保报告列表不会无限增长。 Kafka具有一些（可配置）最大消息大小，并且整个列表将包含在单个消息中。 因此，您应该估计最大消息大小并在投入生产之前应用相应的配置（-> max.message.bytes ）。 在以下网页上查看配置： http : //kafka.apache.org/documentation/#brokerconfigs

最后，您使用Connect API将数据推送到Elastic Search中。 有多种不同的连接器可用（我当然会推荐Confluent一种连接器）。 有关Connect API的更多详细信息： https : //docs.confluent.io/current/connect/userguide.html

Answer 2

直接在SQL和Kafka Streams中不允许这种方法，但是用例是可行的，可以按以下方式实现：

1）使用SOLRJ API在SQL服务器上编写自定义应用程序，只要在SQL中执行DML（插入，更新，删除等）操作，该应用程序就会访问Solr实例。 https://wiki.apache.org/solr/Solrj

2）使用Solr SQL数据导入处理程序通过使用SQL Server，SQL Server将在SQL中发生DML（插入，更新，删除等）操作时自动通知solr。 https://wiki.apache.org/solr/DataImportHandler

Kafka Streams表转换

问题描述

更新资料

2 个解决方案

解决方案1
3 已采纳 2017-09-19 16:47:54

解决方案2
-4 2017-09-19 15:21:20

Kafka Streams表转换

问题描述

更新资料

2 个解决方案

解决方案1 3 已采纳 2017-09-19 16:47:54

解决方案2 -4 2017-09-19 15:21:20

解决方案1
3 已采纳 2017-09-19 16:47:54

解决方案2
-4 2017-09-19 15:21:20