简体   繁体   English

如何在kafka接收器连接器中添加带有kafka消息时间戳的列

[英]How to add column with the kafka message timestamp in kafka sink connector

I am configuring my connector using properties/json files, I am trying to add a timestamp column containing the kafka timestamp when it read the message from source connector without any success. 我正在使用properties / json文件配置连接器,当尝试从源连接器读取消息但没有成功时,我试图添加一个包含kafka时间戳的时间戳列。

I have tried to add transforms , but it's always null and my sink connector "big query" it return me an error 我尝试添加transforms ,但始终为null,并且我的接收器连接器“大查询”返回错误

Failed to update table schema 无法更新表架构

I did put these configurations in bigquery connector properties 我确实将这些配置放在bigquery连接器属性中

transforms=InsertField
transforms.InsertField.timestamp.field=fieldtime
transforms.InsertField.type=org.apache.kafka.connect.transforms.InsertField$Value

My source Config Sap connector 我的源Config Sap连接器

{
    "name": "sap",
    "config": {
        "connector.class": "com.sap.kafka.connect.source.hana.HANASourceConnector",
        "tasks.max": "10",
        "topics": "mytopic",
        "connection.url": "jdbc:sap://IP:30015/",
        "connection.user": "user",
        "connection.password": "pass",
        "group.id":"589f5ff5-1c43-46f4-bdd3-66884d61m185",
        "mytopic.table.name":                          "\"schema\".\"mytable\""  
       }
}

My sink Connector BigQuery 我的水槽连接器BigQuery

name=bigconnect
connector.class=com.wepay.kafka.connect.bigquery.BigQuerySinkConnector
tasks.max=1

sanitizeTopics=true

autoCreateTables=true
autoUpdateSchemas=true

schemaRetriever=com.wepay.kafka.connect.bigquery.schemaregistry.schemaretriever.SchemaRegistrySchemaRetriever
schemaRegistryLocation=http://localhost:8081

bufferSize=100000
maxWriteSize=10000
tableWriteWait=1000

project=kafka-test-217517
topics=mytopic
datasets=.*=sap_dataset
keyfile=/opt/bgaccess.json
transforms=InsertField
transforms.InsertField.timestamp.field=fieldtime    
transforms.InsertField.type=org.apache.kafka.connect.transforms.InsertField$Value

I would guess your error is coming from BigQuery, not Kafka Connect. 我猜你的错误是来自BigQuery,而不是Kafka Connect。

For example, start a Connect Console Consumer in standalone mode, you would see messages like 例如,以独立模式启动Connect Console Consumer,您将看到类似以下的消息

Struct{...,fieldtime=Fri Nov 16 07:38:19 UTC 2018}


Tested with connect-standalone ./connect-standalone.properties ./connect-console-sink.properties 使用connect-standalone ./connect-standalone.properties ./connect-console-sink.properties测试

I have an input topic with Avro data... Update your own settings accordingly 我有一个输入主题,涉及Avro数据...相应更新您自己的设置

connect-standalone.properties connect-standalone.properties

bootstrap.servers=kafka:9092

key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
key.converter.schemas.enable=true

value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://schema-registry:8081
value.converter.schemas.enable=true

offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000

plugin.path=/usr/share/java

connect-console-sink.properties connect-console-sink.properties

name=local-console-sink
connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector
tasks.max=1
topics=input-topic

transforms=InsertField
transforms.InsertField.timestamp.field=fieldtime
transforms.InsertField.type=org.apache.kafka.connect.transforms.InsertField$Value

OLD ANSWER I think i reached to understand the problem behind 老答案我想我已经了解了背后的问题

First of all you can't use the transform InsertField in any Source Connector because the Timestamp value for the msg is assigned at writing time into the topic so it's not something the connector can already know, 首先,您不能在任何源连接器中使用transform InsertField,因为msg的时间戳值是在写入主题时分配的,因此连接器已经不知道了,
for JDBC connector there is this ticket https://github.com/confluentinc/kafka-connect-jdbc/issues/311 对于JDBC连接器,有这张票https://github.com/confluentinc/kafka-connect-jdbc/issues/311

and in sap source connector is not working as well. 并在汁液源连接器​​中无法正常工作。

Second BigQuery connector has a bug that doesn't allow the usage of InsertField to add the timestamp to every table as mentioned here 第二个BigQuery连接器存在一个错误,该错误不允许使用InsertField将时间戳添加到每个表,如此处所述

https://github.com/wepay/kafka-connect-bigquery/issues/125#issuecomment-439102994 https://github.com/wepay/kafka-connect-bigquery/issues/125#issuecomment-439102994

So if you want use bigquery as your output the only solution right now is to manually edit the schema of each table to add the column before loading the cink connector 因此,如果要使用bigquery作为输出,那么当前唯一的解决方案是在加载cink连接器之前手动编辑每个表的架构以添加列

UPDATE 2018-12-03 The final solution to always add the message timestamp in SINK connector. 更新2018-12-03始终在SINK连接器中添加消息时间戳的最终解决方案。 Let's assume you want add the timestamp to EVERY table of sink connector 假设您要将时间戳添加到接收器连接器的每个表中

in your SOURCE CONNECTOR put this configuration 在您的源连接器中放置此配置

"transforms":"InsertField"
"transforms.InsertField.timestamp.field":"fieldtime", 
"transforms.InsertField.type":"org.apache.kafka.connect.transforms.InsertField$Value"

This will add a column name called "fieldtime" to every source tables 这将为每个源表添加一个名为“ fieldtime”的列名

in your SINK CONNECTOR put those configuration 在您的SINK CONNECTOR中放置这些配置

"transforms":"InsertField,DropField",
"transforms.DropField.type":"org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.DropField.blacklist":"fieldtime",
"transforms.InsertSource.timestamp.field":"kafka_timestamp",
"transforms.InsertField.timestamp.field":"fieldtime",
"transforms.InsertField.type":"org.apache.kafka.connect.transforms.InsertField$Value"

this will virtually remove the column fieldtime and add it again with the timestamp of the message 这实际上将删除列的fieldtime,并再次将其与消息的时间戳一起添加

This solution will automatically add the column with the right value without any addition operation 此解决方案将自动添加具有正确值的列,而无需任何添加操作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 kafka 写入 kafka 接收器连接器 - how to write kafka to kafka sink connector JDBC Sink Connector: How to map fields from the Kafka's message to the database table's column - JDBC Sink Connector: How to map fields from the Kafka's message to the database table's column 使用 Kafka Postgres Sink 连接器反序列化消息时出错 - Error deserializing message with Kafka Postgres Sink Connector MongoDB 接收器连接器:消息在 Apache Kafka 中被截断 - MongoDB Sink Connector : Message truncated in Apache Kafka kafka 点燃水槽连接器 - kafka ignite sink connector 在使用kafka-cassandra接收器连接器存储在cassandra中之前,是否有任何向我们的消息添加时间戳的功能? - Is there any function for adding timestamp to our message before storing in cassandra using kafka-cassandra sink connector? 如何使用 MongoDB 在 Kafka Connect Sink 连接器中获取 kafka 消息的标头 - How to get kafka message's headers in Kafka Connect Sink connector with MongoDB 如何使用 FME 处理 Kafka JDBC Sink Connector - How to deal with Kafka JDBC Sink Connector with FME 如何在 Kafka Sink Connector 中手动提交偏移量 - How to commit offset manually in Kafka Sink Connector mongo db sink 连接器,kafka 消息密钥到 mongodb 文档字段 - mongo db sink connector, kafka message key to mongodb document field
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM