[英]How to add column with the kafka message timestamp in kafka sink connector
I am configuring my connector using properties/json files, I am trying to add a timestamp column containing the kafka timestamp when it read the message from source connector without any success. 我正在使用properties / json文件配置连接器,当尝试从源连接器读取消息但没有成功时,我试图添加一个包含kafka时间戳的时间戳列。
I have tried to add transforms
, but it's always null and my sink connector "big query" it return me an error 我尝试添加
transforms
,但始终为null,并且我的接收器连接器“大查询”返回错误
Failed to update table schema
无法更新表架构
I did put these configurations in bigquery connector properties 我确实将这些配置放在bigquery连接器属性中
transforms=InsertField
transforms.InsertField.timestamp.field=fieldtime
transforms.InsertField.type=org.apache.kafka.connect.transforms.InsertField$Value
My source Config Sap connector 我的源Config Sap连接器
{
"name": "sap",
"config": {
"connector.class": "com.sap.kafka.connect.source.hana.HANASourceConnector",
"tasks.max": "10",
"topics": "mytopic",
"connection.url": "jdbc:sap://IP:30015/",
"connection.user": "user",
"connection.password": "pass",
"group.id":"589f5ff5-1c43-46f4-bdd3-66884d61m185",
"mytopic.table.name": "\"schema\".\"mytable\""
}
}
My sink Connector BigQuery 我的水槽连接器BigQuery
name=bigconnect
connector.class=com.wepay.kafka.connect.bigquery.BigQuerySinkConnector
tasks.max=1
sanitizeTopics=true
autoCreateTables=true
autoUpdateSchemas=true
schemaRetriever=com.wepay.kafka.connect.bigquery.schemaregistry.schemaretriever.SchemaRegistrySchemaRetriever
schemaRegistryLocation=http://localhost:8081
bufferSize=100000
maxWriteSize=10000
tableWriteWait=1000
project=kafka-test-217517
topics=mytopic
datasets=.*=sap_dataset
keyfile=/opt/bgaccess.json
transforms=InsertField
transforms.InsertField.timestamp.field=fieldtime
transforms.InsertField.type=org.apache.kafka.connect.transforms.InsertField$Value
I would guess your error is coming from BigQuery, not Kafka Connect. 我猜你的错误是来自BigQuery,而不是Kafka Connect。
For example, start a Connect Console Consumer in standalone mode, you would see messages like 例如,以独立模式启动Connect Console Consumer,您将看到类似以下的消息
Struct{...,fieldtime=Fri Nov 16 07:38:19 UTC 2018}
Tested with connect-standalone ./connect-standalone.properties ./connect-console-sink.properties
使用
connect-standalone ./connect-standalone.properties ./connect-console-sink.properties
测试
I have an input topic with Avro data... Update your own settings accordingly 我有一个输入主题,涉及Avro数据...相应更新您自己的设置
connect-standalone.properties connect-standalone.properties
bootstrap.servers=kafka:9092
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://schema-registry:8081
key.converter.schemas.enable=true
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://schema-registry:8081
value.converter.schemas.enable=true
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/share/java
connect-console-sink.properties connect-console-sink.properties
name=local-console-sink
connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector
tasks.max=1
topics=input-topic
transforms=InsertField
transforms.InsertField.timestamp.field=fieldtime
transforms.InsertField.type=org.apache.kafka.connect.transforms.InsertField$Value
OLD ANSWER I think i reached to understand the problem behind 老答案我想我已经了解了背后的问题
First of all you can't use the transform InsertField in any Source Connector because the Timestamp value for the msg is assigned at writing time into the topic so it's not something the connector can already know, 首先,您不能在任何源连接器中使用transform InsertField,因为msg的时间戳值是在写入主题时分配的,因此连接器已经不知道了,
for JDBC connector there is this ticket https://github.com/confluentinc/kafka-connect-jdbc/issues/311 对于JDBC连接器,有这张票https://github.com/confluentinc/kafka-connect-jdbc/issues/311
and in sap source connector is not working as well. 并在汁液源连接器中无法正常工作。
Second BigQuery connector has a bug that doesn't allow the usage of InsertField to add the timestamp to every table as mentioned here 第二个BigQuery连接器存在一个错误,该错误不允许使用InsertField将时间戳添加到每个表,如此处所述
https://github.com/wepay/kafka-connect-bigquery/issues/125#issuecomment-439102994 https://github.com/wepay/kafka-connect-bigquery/issues/125#issuecomment-439102994
So if you want use bigquery as your output the only solution right now is to manually edit the schema of each table to add the column before loading the cink connector 因此,如果要使用bigquery作为输出,那么当前唯一的解决方案是在加载cink连接器之前手动编辑每个表的架构以添加列
UPDATE 2018-12-03 The final solution to always add the message timestamp in SINK connector. 更新2018-12-03始终在SINK连接器中添加消息时间戳的最终解决方案。 Let's assume you want add the timestamp to EVERY table of sink connector
假设您要将时间戳添加到接收器连接器的每个表中
in your SOURCE CONNECTOR put this configuration 在您的源连接器中放置此配置
"transforms":"InsertField"
"transforms.InsertField.timestamp.field":"fieldtime",
"transforms.InsertField.type":"org.apache.kafka.connect.transforms.InsertField$Value"
This will add a column name called "fieldtime" to every source tables 这将为每个源表添加一个名为“ fieldtime”的列名
in your SINK CONNECTOR put those configuration 在您的SINK CONNECTOR中放置这些配置
"transforms":"InsertField,DropField",
"transforms.DropField.type":"org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.DropField.blacklist":"fieldtime",
"transforms.InsertSource.timestamp.field":"kafka_timestamp",
"transforms.InsertField.timestamp.field":"fieldtime",
"transforms.InsertField.type":"org.apache.kafka.connect.transforms.InsertField$Value"
this will virtually remove the column fieldtime and add it again with the timestamp of the message 这实际上将删除列的fieldtime,并再次将其与消息的时间戳一起添加
This solution will automatically add the column with the right value without any addition operation 此解决方案将自动添加具有正确值的列,而无需任何添加操作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.