简体   繁体   English

无法将数据从 kafka 主题发送到 elasticsearch

[英]Unable to send data from kafka topic to elasticsearch

I'm trying to build a data pipline using mongo get data from mongoDB to my kafka topic db (as a source), elasticsearch (as a sink) and kafka.我正在尝试使用 mongo 从 mongoDB 到我的 kafka 主题数据库(作为源)、elasticsearch(作为接收器)和 kafka 构建数据管道。 I have successfully received data from mongoDB to my kafka topic.我已经成功地从 mongoDB 接收到我的 kafka 主题的数据。 This is an example of data captured from mongoDB这是从 mongoDB 捕获的数据示例

{"_id": {"_data": "825E88FED8000000012B022C0100296E5A10044D2CA180FAF94580B30CFA4B3CC80E1546645F696400645E88FED793AFA61A58411B2A0004"}, "operationType": "insert", "clusterTime": {"$timestamp": {"t": 1586036440, "i": 1}}, "fullDocument": {"_id": {"$oid": "5e88fed793afa61a58411b2a"}, "name": "Lefèvre Mathis", "phoneNumber": 87640262, "phoneNumber2": 98462768, "phoneNumber3": 50591075, "email": "LefèvreMathis@gmail.com", "websiteUrl": "www.LefèvreMathis.fr", "legalInformation": {"companyName": "Duval EI", "siren": 7.3887975858196E13, "nic": 28866, "siret": 7.3887975858196E13, "ape": "49.53", "tva": "FR-1173030343", "description": "Blanditiis et placeat voluptas hic et. Quae et autem inventore ut enim fugit. Nihil velit in ut magnam."}, "professionType": {"type": "Hotel", "category": "professionnel"}, "operator": {"name": "Orange"}, "address": [{"city": "Paris", "street": "Quartier Les Halles, Paris 1er Arrondissement, Paris, Île-de-France, France métropolitaine, 75001, France", "zipCode": 75001, "latitude": "48.86330665", "longitude": "2.348370623761905"}], "openingTimeSet": [{"day": "Lundi", "opening": "08:00", "closing": "18:00"}, {"day": "Mardi", "opening": "08:00", "closing": "18:00"}, {"day": "Mercredi", "opening": "08:00", "closing": "18:00"}, {"day": "Jeudi", "opening": "08:00", "closing": "18:00"}, {"day": "Vendredi", "opening": "08:00", "closing": "18:00"}, {"day": "Samedi", "opening": "08:00", "closing": "18:00"}, {"day": "Dimanche", "opening": "08:00", "closing": "18:00"}], "_class": "com.sofrecom.elasticsearch.model.Subscriber"}, "ns": {"db": "elasticsearchApp", "coll": "subscriber"}, "documentKey": {"_id": {"$oid": "5e88fed793afa61a58411b2a"}}}

The problem is when I run my ES sink connector I get this Exception:问题是当我运行我的 ES sink 连接器时,我得到了这个异常:

Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error: 
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:355)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:86)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:485)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more

Caused by: org.apache.kafka.common.errors.SerializationException: java.io.CharConversionException: Invalid UTF-32 character 0x658b027b (above 0x0010ffff) at char #1, byte #7)

This is my kafka-connect configuration:这是我的 kafka-connect 配置:

 CONNECT_BOOTSTRAP_SERVERS: kafka:9092
  CONNECT_REST_ADVERTISED_HOST_NAME: connect
  CONNECT_REST_PORT: 8083
  CONNECT_GROUP_ID: compose-connect-group
  CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
  CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
  CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
  CONNECT_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
  CONNECT_VALUE_CONVERTER:  org.apache.kafka.connect.json.JsonConverter
  CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
  CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
  CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR:  1
  CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR:  1
  CONNECT_STATUS_STORAGE_REPLICATION_FACTOR:  1
  CONNECT_PLUGIN_PATH: '/usr/share/java,/etc/kafka-connect/jars'
  CONNECT_CONFLUENT_TOPIC_REPLICATION_FACTOR: 1

my es-sink-connector:我的 es-sink 连接器:

{ "name": "sink", "config": { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "connection.url": "http://172.21.0.4:9200", "type.name": "subscriber", "topics": "test5.elasticsearchApp.subscriber", "key.ignore": "false","value.converter.schemas.enable": "false","schema.ignore": "true","value.converter":"org.apache.kafka.connect.json.JsonConverter" } }

and mongodb-source-connector和 mongodb-source-connector

{ "name": "mongo-source", "config": { "connector.class": "com.mongodb.kafka.connect.MongoSourceConnector","tasks.max":1,"connection.uri":"mongodb://mongo1:27017,mongo2:27017","database":"elasticsearchApp","collection":"subscriber", "topic.prefix":"test15","value.converter":"org.apache.kafka.connect.storage.StringConverter"} }

When I tried to use json converter in my mongoDBConnector I get a String format for my payload when consuming from kafka topic当我尝试在我的 mongoDBConnector 中使用 json 转换器时,我在从 kafka 主题消费时得到了我的有效负载的字符串格式

{"schema":{"type":"string","optional":false},"payload":"{\"_id\": {\"_data\": \"825E89EA94000000012B022C0100296E5A10044D2CA180FAF94580B30CFA4B3CC80E1546645F696400645E89EA94FC56002500157F490004\"}, \"operationType\": \"insert\", \"clusterTime\": {\"$timestamp\": {\"t\": 1586096788, \"i\": 1}}, \"fullDocument\": {\"_id\": {\"$oid\": \"5e89ea94fc56002500157f49\"}, \"name\": \"Lefèvre Mathis\", \"phoneNumber\": 87640262, \"phoneNumber2\": 98462768, \"phoneNumber3\": 50591075, \"email\": \"LefèvreMathis@gmail.com\", \"websiteUrl\": \"www.LefèvreMathis.fr\", \"legalInformation\": {\"companyName\": \"Duval EI\", \"siren\": 7.3887975858196E13, \"nic\": 28866, \"siret\": 7.3887975858196E13, \"ape\": \"49.53\", \"tva\": \"FR-1173030343\", \"description\": \"Blanditiis et placeat voluptas hic et. Quae et autem inventore ut enim fugit. Nihil velit in ut magnam.\"}, \"professionType\": {\"type\": \"Hotel\", \"category\": \"professionnel\"}, \"operator\": {\"name\": \"Orange\"}, \"address\": [{\"city\": \"Paris\", \"street\": \"Quartier Les Halles, Paris 1er Arrondissement, Paris, Île-de-France, France métropolitaine, 75001, France\", \"zipCode\": 75001, \"latitude\": \"48.86330665\", \"longitude\": \"2.348370623761905\"}], \"openingTimeSet\": [{\"day\": \"Lundi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Mardi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Mercredi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Jeudi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Vendredi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Samedi\", \"opening\": \"08:00\", \"closing\": \"18:00\"}, {\"day\": \"Dimanche\", \"opening\": \"08:00\", \"closing\": \"18:00\"}], \"_class\": \"com.sofrecom.elasticsearch.model.Subscriber\"}, \"ns\": {\"db\": \"elasticsearchApp\", \"coll\": \"subscriber\"}, \"documentKey\": {\"_id\": {\"$oid\": \"5e89ea94fc56002500157f49\"}}}"}
  1. Don't use this if you don't want the Mongo connector to generate a string payload如果您不希望 Mongo 连接器生成字符串有效负载,请不要使用它

    "value.converter":"org.apache.kafka.connect.storage.StringConverter"
  2. You will need this in the sink because you have both schema and payload in your JSON on the topic您将在接收器中需要它,因为您在主题的 JSON 中同时具有schemapayload

     "value.converter.schemas.enable": "true"
  3. You'll need to use an Elasticsearch index mapping to parse out the string since Connect won't do that for you.您需要使用 Elasticsearch 索引映射来解析字符串,因为 Connect 不会为您执行此操作。

I'm not sure if there is a bug in the Mongo connector.我不确定 Mongo 连接器中是否存在错误。 Never used it, but I would like to think that JSON Comverter should work, or at least Avro.从未使用过它,但我想 JSON 转换器应该可以工作,或者至少是 Avro。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM