[英]Python-processed Avro formatted data sent through a Apache Kafka does not yield same output when dezerialized in Apache Camel/Java processor
我正在運行一個 Kafka 代理,我通過 Python 程序將消息推送到該代理。 為了高效的數據交換,我使用 Apache Avro 格式。 在 Kafka 代理處,消息由帶有處理器的 Camel 路由接收。 在這個處理器中,我想反序列化消息,最后想將數據推送到 InfluxDB。
過程機制有效,但在駱駝路線中,我沒有得到我輸入的數據。在 Python 方面,我創建了一個字典:
testDict = dict()
testDict['name'] = 'avroTest'
testDict['double_one'] = 1.2345
testDict['double_two'] = 1.23
testDict['double_three'] = 2.345
testDict['time_stamp'] = long(time.time() * 1000000000)
Python 端對應的 Avro 模式如下所示:
{
"namespace": "my.namespace",
"name": "myRecord",
"type": "record",
"fields": [
{"name": "name", "type": "string"},
{"name": "double_one", "type": "double"},
{"name": "double_two", "type": "double"},
{"name": "double_three", "type": "double"},
{"name": "time_stamp", "type": "long"}
]
}
用於將 avro 格式的消息發送到 Kafka 的 Python 代碼如下所示:
def sendAvroFormattedMessage(self, dataDict: dict, topic_id: str, schemaDefinition: str) \
-> FutureRecordMetadata:
"""
Method for sending message to kafka broker in the avro binary format
:param dataDict: data dictionary containing message data
:param topic_id: the Kafka topic to send message to
:param schemaDefinition: JSON schema definition
:return: FurtureRecordMetadata
"""
schema = avro.schema.parse(schemaDefinition)
writer = avro.io.DatumWriter(schema)
bytes_stream = io.BytesIO()
encoder = avro.io.BinaryEncoder(bytes_stream)
writer.write(dataDict, encoder)
raw_bytes = bytes_stream.getvalue()
messageBrokerWriterConnection = KafkaProducer(bootstrap_servers=<connectionUrl>, client_id='testLogger')
result = messageBrokerWriterConnection.send(topic=topic_id, value=raw_bytes, key='AVRO_FORMAT'.encode('UTF-8'))
return result
消息按預期到達代理,由駱駝拾取並由以下 JAVA 代碼處理:
from(kafkaEndpoint) //
.process(exchange -> {
Long kafkaInboundTime = Long
.parseLong(exchange.getIn().getHeader("kafka.TIMESTAMP").toString());
if (exchange.getIn().getHeader("kafka.KEY") != null) {
BinaryDecoder decoder = DecoderFactory.get()
.binaryDecoder(exchange.getIn().getBody(InputStream.class), null);
SpecificDatumReader<Record> datumReader = new SpecificDatumReader<>(avroSchema);
System.out.println(datumReader.read(null, decoder).toString());
}
}) //
.to(influxdbEndpoint);
avroSchema
目前在我的 class 的構造函數中進行了硬編碼,如下所示:
avroSchema = SchemaBuilder.record("myRecord") //
.namespace("my.namespace") //
.fields() //
.requiredString("name") //
.requiredDouble("double_one") //
.requiredDouble("double_two") //
.requiredDouble("double_three") //
.requiredLong("time_stamp") //
.endRecord();
System.out.println
的 output 是
{"name": "avroTest", "double_one": 6.803527358993313E-220, "double_two": -0.9919128115125185, "double_three": -0.9775074719163893, "time_stamp": 20}
顯然,出了點問題,但我不知道是什么。 任何幫助表示贊賞。
更新 1由於 Python 代碼在 Intel/Window 機器、Kafka(在 VM 中)和 Linux 機器上的 Java 代碼上運行
可以排除更新 1.1字節序。 兩邊檢查,都“小”
更新 2作為檢查,我將所有字段的架構定義更改為字符串類型。 使用此定義,值和鍵可以正確傳輸 - Python 輸入和 Java/Camel output 是相同的。
更新 3 Kafka 的駱駝潰敗生產者端點沒有任何特殊功能,例如反序列化器等:
"kafka:myTopicName?brokers=host:9092&clientId=myClientID&autoOffsetReset=earliest"
我找到了解決我的問題的方法。 以下 Python 代碼將所需的 output 生成到 Kafka 中:
def sendAvroFormattedMessage(self, dataDict: dict, topic_id: MessageBrokerQueue, schemaDefinition: str) \
-> FutureRecordMetadata:
"""
Method for sending message to kafka broker in the avro binary format
:param dataDict: data dictionary containing message data
:param topic_id: the Kafka topic to send message to
:param schemaDefinition: JSON schema definition
:return: None
"""
schema = avro.schema.parse(schemaDefinition)
bytes_writer = io.BytesIO()
encoder = BinaryEncoder(bytes_writer)
writer = DatumWriter(schema)
writer.write(dataDict, encoder)
raw_bytes = bytes_writer.getvalue()
self._messageBrokerWriterConnection = KafkaProducer(bootstrap_servers=self._connectionUrl)
try:
# NOTE: I use the 'AVRO' key to separate avro formatted messages from others
result = self._messageBrokerWriterConnection.send(topic=topic_id, value=raw_bytes, key='AVRO'.encode('UTF-8'))
except Exception as err:
print(err)
self._messageBrokerWriterConnection.flush()
解決方案的關鍵是將valueDeserializer=...
添加到 Apache Camel 端的端點定義中:
import org.apache.kafka.common.serialization.ByteArrayDeserializer;
...
TEST_QUEUE("kafka:topic_id?brokers=host:port&clientId=whatever&valueDeserializer=" + ByteArrayDeserializer.class.getName());
Apache 駱駝路線代碼包括轉換到 InfluxDB 點然后可以這樣寫:
@Component
public class Route_TEST_QUEUE extends RouteBuilder {
Schema avroSchema = null;
private Route_TEST_QUEUE() {
avroSchema = SchemaBuilder //
.record("ElectronCoolerCryoMessage") //
.namespace("de.gsi.fcc.applications.data.loggers.avro.messages") //
.fields() //
.requiredString("name") //
.requiredDouble("double_one") //
.requiredDouble("double_two") //
.requiredDouble("double_three") //
.requiredLong("time_stamp") //
.endRecord();
}
private String fromEndpoint = TEST_QUEUE.definitionString();
@Override
public void configure() throws Exception {
from(fromEndpoint) //
.process(messagePayload -> {
byte[] data = messagePayload.getIn().getBody(byte[].class);
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(data, null);
SpecificDatumReader<GenericRecord> datumReader = new SpecificDatumReader<>(avroSchema);
GenericRecord record = datumReader.read(null, decoder);
try {
Point.Builder influxPoint = Point
.measurement(record.get("name").toString());
long acqStamp = 0L;
if (record.hasField("time_stamp") && (long) record.get("time_stamp") > 0L) {
acqStamp = (long) record.get("time_stamp");
} else {
acqStamp = Long.parseLong(messagePayload.getIn().getHeader("kafka.TIMESTAMP").toString());
}
influxPoint.time(acqStamp, TimeUnit.NANOSECONDS);
Map<String, Object> fieldMap = new HashMap<>();
avroSchema.getFields().stream() //
.filter(field -> !field.name().equals("keyFieldname")) //
.forEach(field -> {
Object value = record.get(field.name());
fieldMap.put(field.name().toString(), value);
});
influxPoint.fields(fieldMap);
} catch (Exception e) {
MessageLogger.logError(e);
}
}) //
.to(...InfluxEndpoint...) //
.onException(Exception.class) //
.useOriginalMessage() //
.handled(true) //
.to("stream:out");
}
}
}
這適用於我的目的 - 沒有匯合,只有卡夫卡。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.