[英]Convert timestamp fields to ISO 8601 when push data to topic
我有一個流,它使用以下定義從 postgres 中的表中提取數據:
CREATE TABLE "user" (
"_uid" UUID NOT NULL DEFAULT gen_random_uuid() PRIMARY KEY,
"_created" TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
"_updated" TIMESTAMP(3) NULL,
"_disabled" TIMESTAMP(3) NULL,
"display_name" VARCHAR(100) NOT NULL,
"email" VARCHAR(100) NOT NULL UNIQUE,
"password" TEXT NOT NULL
);
在 ksqldb 中,我創建了一個 SOURCE CONNECTOR,如下所示:
CREATE SOURCE CONNECTOR "source-postgres-api_auth" WITH (
"connector.class"='io.confluent.connect.jdbc.JdbcSourceConnector',
"connection.url"='jdbc:postgresql://postgres:5432/api_auth',
"connection.user"='postgres',
"connection.password"='postgres',
"mode"='bulk',
"topic.prefix"='source-postgres-api_auth-',
"table.blacklist"='_changelog, _changelog_lock'
);
這樣我就可以檢測到變化並生成歷史記錄,我有一個像這樣的 STREAM:
CREATE STREAM "stream-api_auth-user" (
"_uid" STRING,
"_created" TIMESTAMP,
"_updated" TIMESTAMP,
"_disabled" TIMESTAMP,
"display_name" STRING,
"email" STRING,
"password" STRING
) WITH (
KAFKA_TOPIC = 'source-postgres-api_auth-user',
VALUE_FORMAT = 'AVRO'
);
我從這個流中創建了一個表:
CREATE TABLE "table-api_auth-user" WITH (
KAFKA_TOPIC = 'table-api_auth-user',
VALUE_FORMAT = 'AVRO'
) AS SELECT
"_uid",
LATEST_BY_OFFSET("_created") AS "_created",
LATEST_BY_OFFSET("_updated") AS "_updated",
LATEST_BY_OFFSET("_disabled") AS "_disabled",
LATEST_BY_OFFSET("display_name") AS "display_name",
LATEST_BY_OFFSET("email") AS "email",
LATEST_BY_OFFSET("password") AS "password"
FROM "stream-api_auth-user"
GROUP BY "_uid"
EMIT CHANGES;
最后,我有一個 SYNC 到 elasticsearch 是這樣的:
CREATE SINK CONNECTOR "sync-elasticsearch-user" WITH (
'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
'connection.url' = 'http://elasticsearch:9200',
'type.name' = 'kafka-connect',
'topics' = 'table-api_auth-user'
);
我的問題是,當我查看 elasticsearch 時,TIMESTAMP 類型的字段以數字形式出現,我意識到 TABLE 使用的主題數據正在轉換為數字而不是 ISO 8601:
ksql> print "table-api_auth-user";
Key format: HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING
Value format: AVRO or KAFKA_STRING
rowtime: 2022/12/01 21:13:36.844 Z, key: [a2d9ff97-2c95-4da0-98e0-5492@7293921773168638261/-], value: {"_created":1669926069726,"_updated":null,"_disabled":null,"display_name":"Super User","email":"superuser@email.com","password":"4072d7365233d8ede7ca8548543222dfb96b17780aa8d6ff93ab69c0985ef21fc8105d03590a61b9"}, partition: 0
rowtime: 2022/12/01 21:13:36.847 Z, key: [b60448d2-e518-4479-9aff-2734@3631370472181359666/-], value: {"_created":1669916433173,"_updated":1669916803008,"_disabled":1669916803008,"display_name":"Cremin 7a8c281c4bed","email":"Byrne.8dd1dcf3bfa4@yahoo.com","password":"e89af05eae87f0667eba762fdd382ce942bb76b796b8fe20d9e71f142bac9f7a6fbbfc6b51d4527e"}, partition: 0
有什么我可以做的,以便當表將數據發送到主題時,這些時間戳類型的字段將轉換為 ISO 8601?
有誰可以幫助我嗎?
您可以通過攝取管道從 Elasticsearch 端轉換字段:
https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html
我讀到沒有選項可以在接收器連接器上指定攝取管道:
https://github.com/confluentinc/kafka-connect-elasticsearch/issues/72
所以你必須創建一個索引模板來捕獲索引的名稱,並應用管道。
第 1 步:創建攝取管道
我將使用日期處理器將您的格式 (UNIX_MS) 轉換為 ISO8601
https://www.elastic.co/guide/en/elasticsearch/reference/current/date-processor.html
PUT _ingest/pipeline/parsedate
{
"processors": [
{
"date": {
"field": "date",
"formats": [
"UNIX_MS"
],
"target_field": "date_converted",
"ignore_failure": true
}
}
]
}
測試輸出(日期字段與 date_converted:
{
"docs": [
{
"doc": {
"_index": "_index",
"_id": "_id",
"_version": "-3",
"_source": {
"date": 1669916803008,
"date_converted": "2022-12-01T17:46:43.008Z"
},
"_ingest": {
"timestamp": "2022-12-02T07:54:02.731666786Z"
}
}
}
]
}
第 2 步:創建索引模板
假設您的索引名稱是 table-api_auth-user*
PUT _index_template/template_1
{
"index_patterns": ["table-api_auth-user*"],
"template": {
"settings": {
"index.default_pipeline": "parsedate"
}
}
}
從現在開始,每次您將文檔發送到此索引:table-api_auth-user*,都會應用您在開始時設置的攝取管道。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.