简体   繁体   English

无法使用Confluent Elasticsearch Sink连接器将Kafka主题数据转换为结构化JSON

[英]Unable to convert Kafka topic data into structured JSON with Confluent Elasticsearch sink connector

I'm building a data pipeline using Kafka. 我正在使用Kafka构建数据管道。 Data flow is as follows: capture data change in mongodb and have it sent to elasticsearch. 数据流如下:在mongodb中捕获数据更改,并将其发送给elasticsearch。

在此处输入图片说明

MongoDB MongoDB

  • version 3.6 版本3.6
  • shard cluster 分片集群

Kafka 卡夫卡

  • Confuent Platform 4.1.0 Confuent平台4.1.0
  • mongoDB source connector : debezium 0.7.5 mongoDB源连接器:debezium 0.7.5
  • elasticserach sink connector 弹力水槽连接器

Elasticsearch 弹性搜索

  • version 6.1.0 版本6.1.0

Since I'm still testing, Kafka-related systems are running on single server. 由于我仍在测试中,与Kafka相关的系统在单个服务器上运行。

  • start zookeepr 启动Zookeepr

     $ bin/zookeeper-server-start etc/kafka/zookeeper.properties 
  • start bootstrap server 启动引导服务器

     $ bin/kafka-server-start etc/kafka/server.properties 
  • start registry schema 启动注册表架构

     $ bin/schema-registry-start etc/schema-registry/schema-registry.properties 
  • start mongodb source connetor 启动mongodb源connetor

     $ bin/connect-standalone \\ etc/schema-registry/connect-avro-standalone.properties \\ etc/kafka/connect-mongo-source.properties $ cat etc/kafka/connect-mongo-source.properties >>> name=mongodb-source-connector connector.class=io.debezium.connector.mongodb.MongoDbConnector mongodb.hosts='' initial.sync.max.threads=1 tasks.max=1 mongodb.name=higee $ cat etc/schema-registry/connect-avro-standalone.properties >>> bootstrap.servers=localhost:9092 key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.value.converter=org.apache.kafka.connect.json.JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false rest.port=8083 
  • start elasticsearch sink connector 启动弹性搜索接收器连接器

     $ bin/connect-standalone \\ etc/schema-registry/connect-avro-standalone2.properties \\ etc/kafka-connect-elasticsearch/elasticsearch.properties $ cat etc/kafka-connect-elasticsearch/elasticsearch.properties >>> name=elasticsearch-sink connector.class=io.confluent.connect.elasticsearch.ElasticsearchSinkConnector tasks.max=1 topics=higee.higee.higee key.ignore=true connection.url='' type.name=kafka-connect $ cat etc/schema-registry/connect-avro-standalone2.properties >>> bootstrap.servers=localhost:9092 key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.value.converter=org.apache.kafka.connect.json.\\ JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false rest.port=8084 

Everything is fine with above system. 上面的系统一切都很好。 Kafka connector captures data changes (CDC) and successfully sends it to elasticsearch via sink connector. Kafka连接器捕获数据更改(CDC),并通过接收器连接器成功将其发送到elasticsearch。 The problem is that I cannot convert string-type-messaged data into structured data type. 问题是我无法将字符串类型的消息数据转换为结构化数据类型。 For instance, let's consume topic-data after making some changes to mongodb. 例如,在对mongodb进行一些更改之后,让我们使用主题数据。

    $ bin/kafka-avro-console-consumer \
    --bootstrap-server localhost:9092 \
    --topic higee.higee.higee --from-beginning | jq

Then, I get following result. 然后,我得到以下结果。

    "after": null,
      "patch": {
        "string": "{\"_id\" : {\"$oid\" : \"5ad97f982a0f383bb638ecac\"},\"name\" : \"higee\",\"salary\" : 100,\"origin\" : \"South Korea\"}"
      },
      "source": {
        "version": {
          "string": "0.7.5"
        },
        "name": "higee",
        "rs": "172.31.50.13",
        "ns": "higee",
        "sec": 1524214412,
        "ord": 1,
        "h": {
          "long": -2379508538412995600
        },
        "initsync": {
          "boolean": false
        }
      },
      "op": {
        "string": "u"
      },
      "ts_ms": {
        "long": 1524214412159
      }
    }

Then, if I go to elasticsearch, I get following result. 然后,如果我去elasticsearch,我得到以下结果。

    {
        "_index": "higee.higee.higee",
        "_type": "kafka-connect",
        "_id": "higee.higee.higee+0+3",
        "_score": 1,
        "_source": {
          "after": null,
          "patch": """{"_id" : {"$oid" : "5ad97f982a0f383bb638ecac"}, 
                       "name" : "higee", 
                       "salary" : 100,
                       "origin" : "South Korea"}""",
          "source": {
            "version": "0.7.5",
            "name": "higee",
            "rs": "172.31.50.13",
            "ns": "higee",
            "sec": 1524214412,
            "ord": 1,
            "h": -2379508538412995600,
            "initsync": false
          },
          "op": "u",
          "ts_ms": 1524214412159
        }
      }

One that I want to achieve is something as follows 我要实现的目标如下

    {
        "_index": "higee.higee.higee",
        "_type": "kafka-connect",
        "_id": "higee.higee.higee+0+3",
        "_score": 1,
        "_source": {
          "oid" : "5ad97f982a0f383bb638ecac",
          "name" : "higee", 
          "salary" : 100,
          "origin" : "South Korea"
         }"
     }

Some of the options I've been trying and still considering is as follows. 我一直在尝试并仍在考虑的一些选项如下。

  • logstash Logstash

    • case 1 : don't know how to parse those characters (/u0002, /u0001) 情况1:不知道如何解析这些字符(/ u0002,/ u0001)

      • logstash.conf logstash.conf

         input { kafka { bootstrap_servers => ["localhost:9092"] topics => ["higee.higee.higee"] auto_offset_reset => "earliest" codec => json { charset => "UTF-8" } } } filter { json { source => "message" } } output { stdout { codec => rubydebug } } 
      • result 结果

         { "message" => "H\ \{\\"_id\\" : \\ {\\"$oid\\" : \\"5adafc0e2a0f383bb63910a6\\"}, \\ \\"name\\" : \\"higee\\", \\ \\"salary\\" : 101, \\ \\"origin\\" : \\"South Korea\\"} \\ \\\n0.7.5\\nhigee \\ \172.31.50.13\higee.higee2 \\   ح\\v\\  ̗     \\u\     X", "tags" => [[0] "_jsonparsefailure"] } 
    • case 2 情况2

      • logstash.conf logstash.conf

         input { kafka { bootstrap_servers => ["localhost:9092"] topics => ["higee.higee.higee"] auto_offset_reset => "earliest" codec => avro { schema_uri => "./test.avsc" } } } filter { json { source => "message" } } output { stdout { codec => rubydebug } } 
      • test.avsc 测试文件

         { "namespace": "example", "type": "record", "name": "Higee", "fields": [ {"name": "_id", "type": "string"}, {"name": "name", "type": "string"}, {"name": "salary", "type": "int"}, {"name": "origin", "type": "string"} ] } 
      • result 结果

         An unexpected error occurred! {:error=>#<NoMethodError: undefined method `type_sym' for nil:NilClass>, :backtrace=> ["/home/ec2-user/logstash- 6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:224:in `match_schemas'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:280:in `read_data'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:376:in `read_union'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:309:in `read_data'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:384:in `block in read_record'", "org/jruby/RubyArray.java:1734:in `each'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:382:in `read_record'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:310:in `read_data'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/avro- 1.8.2/lib/avro/io.rb:275:in `read'", "/home/ec2- user/logstash-6.1.0/vendor/bundle/jruby/2.3.0/gems/ logstash-codec-avro-3.2.3-java/lib/logstash/codecs/ avro.rb:77:in `decode'", "/home/ec2-user/logstash-6.1.0/ vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka- 8.0.2/lib/ logstash/inputs/kafka.rb:254:in `block in thread_runner'", "/home/ec2-user/logstash- 6.1.0/vendor/bundle/jruby/2.3.0/gems/logstash-input-kafka- 8.0.2/lib/logstash/inputs/kafka.rb:253:in `block in thread_runner'"]} 
  • python client python客户端

    • consumes topic and produce with different topic name after some data manipulation so that elasticsearch sink connector could just consume well-formatted message from python-manipulated topic 经过一些数据操作后,将消耗主题并使用不同的主题名称进行生成,以便Elasticsearch Sink连接器可以仅使用来自python操作主题的格式正确的消息
    • kafka library : wasn't able to decode message kafka库:无法解码消息

       from kafka import KafkaConsumer consumer = KafkaConsumer( topics='higee.higee.higee', auto_offset_reset='earliest' ) for message in consumer: message.value.decode('utf-8') >>> 'utf-8' codec can't decode byte 0xe4 in position 6: invalid continuation byte 
    • confluent_kafka wasn't compatible with python 3 confluent_kafka与python 3不兼容


Any idea how I can jsonify data in elasticsearch? 知道如何在Elasticsearch中对数据进行JSON处理吗? Following are sources I searched. 以下是我搜索的资源。

Thanks in advance. 提前致谢。


Some attempts 一些尝试

1) I've changed my connect-mongo-source.properties file as follows to test transformation. 1)我已经如下更改了connect-mongo-source.properties文件以测试转换。

    $ cat etc/kafka/connect-mongo-source.properties
    >>> 
    name=mongodb-source-connector
    connector.class=io.debezium.connector.mongodb.MongoDbConnector
    mongodb.hosts=''
    initial.sync.max.threads=1
    tasks.max=1
    mongodb.name=higee
    transforms=unwrap     
    transforms.unwrap.type = io.debezium.connector.mongodbtransforms.UnwrapFromMongoDbEnvelope

And following is error log I got. 以下是我得到的错误日志。 Not yet being comfortable with Kafka and more importantly debezium platform, I wasn't able to debug this error. 对于Kafka以及更重要的debezium平台还不满意,我无法调试此错误。

ERROR WorkerSourceTask{id=mongodb-source-connector-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
org.bson.json.JsonParseException: JSON reader expected a string but found '0'.
    at org.bson.json.JsonReader.visitBinDataExtendedJson(JsonReader.java:904)
    at org.bson.json.JsonReader.visitExtendedJSON(JsonReader.java:570)
    at org.bson.json.JsonReader.readBsonType(JsonReader.java:145)
    at org.bson.codecs.BsonDocumentCodec.decode(BsonDocumentCodec.java:82)
    at org.bson.codecs.BsonDocumentCodec.decode(BsonDocumentCodec.java:41)
    at org.bson.codecs.BsonDocumentCodec.readValue(BsonDocumentCodec.java:101)
    at org.bson.codecs.BsonDocumentCodec.decode(BsonDocumentCodec.java:84)
    at org.bson.BsonDocument.parse(BsonDocument.java:62)
    at io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope.apply(UnwrapFromMongoDbEnvelope.java:45)
    at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
    at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:218)
    at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:194)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

2) In this time, I've changed elasticsearch.properties and didn't make a change to connect-mongo-source.properties. 2)这一次,我更改了elasticsearch.properties,但未更改connect-mongo-source.properties。

$ cat connect-mongo-source.properties

    name=mongodb-source-connector
    connector.class=io.debezium.connector.mongodb.MongoDbConnector
    mongodb.hosts=''
    initial.sync.max.threads=1
    tasks.max=1
    mongodb.name=higee

$ cat elasticsearch.properties

    name=elasticsearch-sink
    connector.class = io.confluent.connect.elasticsearch.ElasticsearchSinkConnector
    tasks.max=1
    topics=higee.higee.higee
    key.ignore=true
    connection.url=''
    type.name=kafka-connect
    transforms=unwrap
    transforms.unwrap.type = io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope

And I got following error. 而且我得到以下错误。

ERROR WorkerSinkTask{id=elasticsearch-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:172)
org.bson.BsonInvalidOperationException: Document does not contain key $set
    at org.bson.BsonDocument.throwIfKeyAbsent(BsonDocument.java:844)
    at org.bson.BsonDocument.getDocument(BsonDocument.java:135)
    at io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope.apply(UnwrapFromMongoDbEnvelope.java:53)
    at org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:480)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:301)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:205)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:173)
    at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:170)
    at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:214)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

3) changed test.avsc and ran logstash. 3)更改了test.avsc并运行了logstash。 I didn't get any error message but the outcome wasn't something I was expecting in that origin , salary , name fields were all empty even though they were given non-null values. 我没有收到任何错误消息,但即使在给定非null值的情况下,结果也不是我期望的那样,因为originsalaryname字段都是空的。 I was even able to read data through console-consumer properly. 我什至能够通过控制台用户正确读取数据。

$ cat test.avsc
>>>
    {
      "type" : "record",
      "name" : "MongoEvent",
      "namespace" : "higee.higee",
      "fields" : [ {
        "name" : "_id",
        "type" : {
          "type" : "record",
          "name" : "HigeeEvent",
          "fields" : [ {
            "name" : "$oid",
            "type" : "string"
          }, {
            "name" : "salary",
            "type" : "long"
          }, {
            "name" : "origin",
            "type" : "string"
          }, {
            "name" : "name",
            "type" : "string"
          } ]
        }
      } ]
    }

$ cat logstash3.conf
>>>
    input {
      kafka {
        bootstrap_servers => ["localhost:9092"]
        topics => ["higee.higee.higee"]
        auto_offset_reset => "earliest"
        codec => avro {
          schema_uri => "./test.avsc"
        }
      }
    }

    output {
      stdout {
       codec => rubydebug
      }
    }

$ bin/logstash -f logstash3.conf
>>>
    {
    "@version" => "1",
    "_id" => {
      "salary" => 0,
      "origin" => "",
      "$oid" => "",
      "name" => ""
    },
    "@timestamp" => 2018-04-25T09:39:07.962Z
    }

Python Client Python客户端

You must use the Avro Consumer, otherwise you will get 'utf-8' codec can't decode byte 必须使用Avro Consumer,否则您将获得'utf-8' codec can't decode byte

Even this example will not work because you still need the schema registry to lookup the schema. 即使这个例子中行不通的 ,因为你仍然需要架构注册表中查找的模式。

The prerequisites of Confluent's Python Client says it works with Python 3.x Confluent的Python客户端的前提条件要求它可与Python 3.x一起使用

Nothing is stopping you from using a different client, so not sure why you left it at only trying Python. 没有什么可以阻止您使用其他客户端,因此不确定为什么只尝试使用Python就将其保留了。

Logstash Avro Codec Logstash Avro编解码器

  1. JSON Codec cannot decode Avro data. JSON编解码器无法解码Avro数据。 I don't think the json filter following the avro input codec will work either 我认为avro输入编解码器之后的json过滤器也不起作用
  2. Your Avro schema is wrong - You're missing the $oid in place of _id 您的Avro模式错误-您缺少$oid来代替_id
  3. There is a difference between "raw Avro" (that includes the schema within the message itself), and Confluent's encoded version of it (which only contains the schema ID in the registry). “原始Avro”(包括消息本身内的模式)与Confluent的编码版本(在注册表中仅包含模式ID)之间有区别。 Meaning, Logstash doesn't integrate with the Schema Registry... at least not without a plugin . 意思是,Logstash不会与Schema Registry集成...至少不是没有插件

Your AVSC should actually look like this 您的AVSC实际上应该是这样的

{
  "type" : "record",
  "name" : "MongoEvent",
  "namespace" : "higee.higee",
  "fields" : [ {
    "name" : "_id",
    "type" : {
      "type" : "record",
      "name" : "HigeeEvent",
      "fields" : [ {
        "name" : "$oid",
        "type" : "string"
      }, {
        "name" : "salary",
        "type" : "long"
      }, {
        "name" : "origin",
        "type" : "string"
      }, {
        "name" : "name",
        "type" : "string"
      } ]
    }
  } ]
}

However, Avro doesn't allow for names starting with anything but a regex of [A-Za-z_] , so that $oid would be a problem. 但是, Avro不允许以[A-Za-z_]的正则表达式开头的名称开头 ,因此$oid将是一个问题。

While I would not recommend it (nor have actually tried it), one possible way to get your JSON-encoded Avro data into Logstash from the Avro console consumer could be use the Pipe input plugin 尽管我不推荐这样做(也没有实际尝试过),但从Avro控制台使用者将JSON编码的Avro数据导入Logstash的一种可能方法是使用Pipe输入插件

input {
  pipe {
    codec => json
    command => "/path/to/confluent/bin/kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic higee.higee.higee --from-beginning" 
  }
}

Debezium 德比兹

note that the after value is always a string, and that by convention it will contain a JSON representation of the document 请注意, after值始终是一个字符串,并且按照惯例,它将包含文档的JSON表示形式

http://debezium.io/docs/connectors/mongodb/ http://debezium.io/docs/connectors/mongodb/

I think this also applies to patch values, but I don't know Debezium, really. 我认为这也适用于patch值,但我真的不了解Debezium。

Kafka won't parse the JSON in-flight without the use of a Simple Message Transform (SMT). 如果不使用简单消息转换(SMT),Kafka将不会在运行中解析JSON。 Reading the documentation you linked to, you should probably add these to your Connect Source properties 阅读链接到的文档,您可能应该将它们添加到Connect Source属性中

transforms=unwrap
transforms.unwrap.type=io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope

Also worth pointing out, field flattening is on the roadmap - DBZ-561 值得一提的是,路线图上也将进行场平坦化-DBZ-561

Kafka Connect Elasticsearch Kafka Connect Elasticsearch

Elasticsearch doesn't parse and process encoded JSON string objects without the use of something like Logstash or its JSON Processor . 如果不使用Logstash或其JSON Processor之类的东西,Elasticsearch不会解析和处理编码的JSON字符串对象。 Rather, it only indexes them as a whole string body. 相反,它仅将它们索引为整个字符串主体。

If I recall correctly, Connect will only apply an Elasticsearch mapping to top-level Avro fields, not nested ones. 如果我没记错的话,Connect只会将Elasticsearch映射应用于顶级Avro字段,而不应用于嵌套字段。

In other words, the mapping that is generated follows this pattern, 换句话说,生成的映射遵循此模式,

"patch": {
    "string": "...some JSON object string here..."
  },

Where you actually need to be like this - perhaps manually defining your ES index 您实际需要的位置-也许手动定义ES索引

"patch": {
   "properties": {
      "_id": {
        "properties" {
          "$oid" :  { "type": "text" }, 
          "name" :  { "type": "text" },
          "salary":  { "type": "int"  }, 
          "origin": { "type": "text" }
      },

Again, not sure if the dollar sign is allowed, though. 同样,不确定是否允许使用美元符号。

Kafka Connect MongoDB Source Kafka Connect MongoDB源

If none of the above are working, you could attempt a different connector 如果以上都不起作用,则可以尝试使用其他连接器

I was able to solve this issue using python kafka client. 我能够使用python kafka客户端解决此问题。 Following is new architecture of my pipeline. 以下是我的管道的新架构。

在此处输入图片说明

I used python 2 even though Confluent document says that python3 is supported. 即使Confluent文档说支持python3,我也使用了python 2。 Main reason was that there were some python2-syntax code. 主要原因是有一些python2语法代码。 For instance...(Not exactly following line but similar syntax) 例如...(不完全是下面一行,但语法相似)

    except NameError, err:

In order to use with Python3 I need to convert above lines into: 为了与Python3配合使用,我需要将上述行转换为:

    except NameError as err:

That being said, following is my python code. 话虽如此,以下是我的python代码。 Note that this code is only for prototyping and not for production yet. 请注意,此代码仅用于原型设计,尚不用于生产。

Consume a message via Confluent Consumer 通过Confluent Consumer消费消息

  • code

     from confluent_kafka.avro import AvroConsumer c = AvroConsumer({ 'bootstrap.servers': '', 'group.id': 'groupid', 'schema.registry.url': '' }) c.subscribe(['higee.higee.higee']) x = True while x: msg = c.poll(100) if msg: message = msg.value() print(message) x = False c.close() 
  • (after updating a document in mongodb) let's check message variable (在mongodb中更新文档之后)让我们检查message变量

     {u'after': None, u'op': u'u', u'patch': u'{ "_id" : {"$oid" : "5adafc0e2a0f383bb63910a6"}, "name" : "higee", "salary" : 100, "origin" : "S Korea"}', u'source': { u'h': 5734791721791032689L, u'initsync': False, u'name': u'higee', u'ns': u'higee.higee', u'ord': 1, u'rs': u'', u'sec': 1524362971, u'version': u'0.7.5'}, u'ts_ms': 1524362971148 } 

manipulate message consumed 处理消耗的消息

  • code

     patch = message['patch'] patch_dict = eval(patch) patch_dict.pop('_id') 
  • check patch_dict 检查patch_dict

     {'name': 'higee', 'origin': 'S Korea', 'salary': 100} 

Produce a message via Confluent Producer 通过Confluent Producer产生消息

    from confluent_kafka import avro
    from confluent_kafka.avro import AvroProducer

    value_schema_str = """
    {
       "namespace": "higee.higee",
       "name": "MongoEvent",
       "type": "record",
       "fields" : [
           {
               "name" : "name",
               "type" : "string"
           },
           {
              "name" : "origin",
              "type" : "string"
           },
           {
               "name" : "salary",
               "type" : "int32"
           }
       ]
    }
    """
    AvroProducerConf = {
        'bootstrap.servers': '',
        'schema.registry.url': ''
    }

    value_schema = avro.load('./user.avsc')
    avroProducer = AvroProducer(
                       AvroProducerConf, 
                       default_value_schema=value_schema
                   )

    avroProducer.produce(topic='python', value=patch_dict)
    avroProducer.flush()

The only thing left is to make elasticsearch sink connector respond to new topic 'python' by setting configuration in following format. 剩下的唯一事情是通过以以下格式设置配置,使elasticsearch sink连接器响应新主题“ python”。 Everything remains the same except topics . topics外,其他所有内容均保持不变。

    name=elasticsearch-sink
    connector.class= io.confluent.connect \ 
                     elasticsearch.ElasticsearchSinkConnector
    tasks.max=1
    topics=python
    key.ignore=true
    connection.url=''
    type.name=kafka-connect

Then run the elasticsearch sink connector and have it checked at elasticsearch. 然后运行elasticsearch接收器连接器,并在elasticsearch上对其进行检查。

    {
        "_index": "zzzz",
        "_type": "kafka-connect",
        "_id": "zzzz+0+3",
        "_score": 1,
        "_source": {
          "name": "higee",
          "origin": "S Korea",
          "salary": 100
        }
      }

+1 to @cricket_007's suggestion - use the io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope single message transformation. +1 @ cricket_007的建议-使用io.debezium.connector.mongodb.transforms.UnwrapFromMongoDbEnvelope单个消息转换。 You can read more about SMTs and their benefit's here . 您可以在此处阅读有关SMT及其优势的更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM