簡體   English   中英

在 Apache Camel/Java 處理器中反序列化時,通過 Apache Kafka 發送的 Python 處理的 Avro 格式數據不會產生相同的 output

[英]Python-processed Avro formatted data sent through a Apache Kafka does not yield same output when dezerialized in Apache Camel/Java processor

我正在運行一個 Kafka 代理,我通過 Python 程序將消息推送到該代理。 為了高效的數據交換,我使用 Apache Avro 格式。 在 Kafka 代理處,消息由帶有處理器的 Camel 路由接收。 在這個處理器中,我想反序列化消息,最后想將數據推送到 InfluxDB。

過程機制有效,但在駱駝路線中,我沒有得到我輸入的數據。在 Python 方面,我創建了一個字典:

testDict = dict()
testDict['name'] = 'avroTest'
testDict['double_one'] = 1.2345
testDict['double_two'] = 1.23
testDict['double_three'] = 2.345
testDict['time_stamp'] = long(time.time() * 1000000000)

Python 端對應的 Avro 模式如下所示:

{
  "namespace": "my.namespace",
  "name": "myRecord",
  "type": "record",
  "fields": [
    {"name": "name",         "type": "string"},
    {"name": "double_one",   "type": "double"},
    {"name": "double_two",   "type": "double"},
    {"name": "double_three", "type": "double"},
    {"name": "time_stamp",   "type": "long"}
  ]
}

用於將 avro 格式的消息發送到 Kafka 的 Python 代碼如下所示:

def sendAvroFormattedMessage(self, dataDict: dict, topic_id: str, schemaDefinition: str) \
        -> FutureRecordMetadata:
    """
    Method for sending message to kafka broker in the avro binary format
    :param dataDict: data dictionary containing message data
    :param topic_id: the Kafka topic to send message to
    :param schemaDefinition: JSON schema definition
    :return: FurtureRecordMetadata
    """
    schema = avro.schema.parse(schemaDefinition)
    writer = avro.io.DatumWriter(schema)
    bytes_stream = io.BytesIO()
    encoder = avro.io.BinaryEncoder(bytes_stream)
    writer.write(dataDict, encoder)
    raw_bytes = bytes_stream.getvalue()

    messageBrokerWriterConnection = KafkaProducer(bootstrap_servers=<connectionUrl>, client_id='testLogger')
    
    result = messageBrokerWriterConnection.send(topic=topic_id, value=raw_bytes, key='AVRO_FORMAT'.encode('UTF-8'))
    return result

消息按預期到達代理,由駱駝拾取並由以下 JAVA 代碼處理:

from(kafkaEndpoint) //
                .process(exchange -> {
                    Long kafkaInboundTime = Long
                            .parseLong(exchange.getIn().getHeader("kafka.TIMESTAMP").toString());
                    if (exchange.getIn().getHeader("kafka.KEY") != null) {

                        BinaryDecoder decoder = DecoderFactory.get()
                                .binaryDecoder(exchange.getIn().getBody(InputStream.class), null);

                        SpecificDatumReader<Record> datumReader = new SpecificDatumReader<>(avroSchema);

                        System.out.println(datumReader.read(null, decoder).toString());
                    }
                }) //
                .to(influxdbEndpoint);

avroSchema目前在我的 class 的構造函數中進行了硬編碼,如下所示:

avroSchema = SchemaBuilder.record("myRecord") //
                .namespace("my.namespace") //
                .fields() //
                .requiredString("name") //
                .requiredDouble("double_one") //
                .requiredDouble("double_two") //
                .requiredDouble("double_three") //
                .requiredLong("time_stamp") //  
                .endRecord();

System.out.println的 output 是

{"name": "avroTest", "double_one": 6.803527358993313E-220, "double_two": -0.9919128115125185, "double_three": -0.9775074719163893, "time_stamp": 20}

顯然,出了點問題,但我不知道是什么。 任何幫助表示贊賞。

更新 1由於 Python 代碼在 Intel/Window 機器、Kafka(在 VM 中)和 Linux 機器上的 Java 代碼上運行

可以排除更新 1.1字節序。 兩邊檢查,都“小”

更新 2作為檢查,我將所有字段的架構定義更改為字符串類型。 使用此定義,值和鍵可以正確傳輸 - Python 輸入和 Java/Camel output 是相同的。

更新 3 Kafka 的駱駝潰敗生產者端點沒有任何特殊功能,例如反序列化器等:

"kafka:myTopicName?brokers=host:9092&clientId=myClientID&autoOffsetReset=earliest"

我找到了解決我的問題的方法。 以下 Python 代碼將所需的 output 生成到 Kafka 中:

def sendAvroFormattedMessage(self, dataDict: dict, topic_id: MessageBrokerQueue, schemaDefinition: str) \
        -> FutureRecordMetadata:
    """
    Method for sending message to kafka broker in the avro binary format
    :param dataDict: data dictionary containing message data
    :param topic_id: the Kafka topic to send message to
    :param schemaDefinition: JSON schema definition
    :return: None
    """
    schema = avro.schema.parse(schemaDefinition)

    bytes_writer = io.BytesIO()
    encoder = BinaryEncoder(bytes_writer)
    writer = DatumWriter(schema)
    writer.write(dataDict, encoder)
    raw_bytes = bytes_writer.getvalue()

    self._messageBrokerWriterConnection = KafkaProducer(bootstrap_servers=self._connectionUrl)

    try:
        # NOTE: I use the 'AVRO' key to separate avro formatted messages from others 
        result = self._messageBrokerWriterConnection.send(topic=topic_id, value=raw_bytes, key='AVRO'.encode('UTF-8'))
    except Exception as err:
        print(err)
    self._messageBrokerWriterConnection.flush()

解決方案的關鍵是將valueDeserializer=...添加到 Apache Camel 端的端點定義中:

import org.apache.kafka.common.serialization.ByteArrayDeserializer;

 ...

TEST_QUEUE("kafka:topic_id?brokers=host:port&clientId=whatever&valueDeserializer=" + ByteArrayDeserializer.class.getName());

Apache 駱駝路線代碼包括轉換到 InfluxDB 點然后可以這樣寫:

@Component
public class Route_TEST_QUEUE extends RouteBuilder {

   Schema avroSchema = null;

   private Route_TEST_QUEUE() {
       avroSchema = SchemaBuilder //
             .record("ElectronCoolerCryoMessage") //       
             .namespace("de.gsi.fcc.applications.data.loggers.avro.messages") //
            .fields() //
            .requiredString("name") //
            .requiredDouble("double_one") //
            .requiredDouble("double_two") //
            .requiredDouble("double_three") //
            .requiredLong("time_stamp") // 
            .endRecord();
    }

    private String fromEndpoint = TEST_QUEUE.definitionString();

    @Override
    public void configure() throws Exception {

        from(fromEndpoint) //
                .process(messagePayload -> {        
                    byte[] data = messagePayload.getIn().getBody(byte[].class);
                    BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(data, null);
                    SpecificDatumReader<GenericRecord> datumReader = new SpecificDatumReader<>(avroSchema);
                    GenericRecord record = datumReader.read(null, decoder);

                    try {
                        Point.Builder influxPoint = Point
                            .measurement(record.get("name").toString());
                        long acqStamp = 0L;
                        if (record.hasField("time_stamp") && (long) record.get("time_stamp") > 0L) {
                            acqStamp = (long) record.get("time_stamp");
                        } else {
                            acqStamp = Long.parseLong(messagePayload.getIn().getHeader("kafka.TIMESTAMP").toString());
                        }

                        influxPoint.time(acqStamp, TimeUnit.NANOSECONDS);

                        Map<String, Object> fieldMap = new HashMap<>();

                        avroSchema.getFields().stream() //
                                .filter(field ->    !field.name().equals("keyFieldname")) //
                                .forEach(field -> {
                                     Object value = record.get(field.name());
                                    fieldMap.put(field.name().toString(), value);
                                });

                        influxPoint.fields(fieldMap);

                    } catch (Exception e) {
                         MessageLogger.logError(e);
                    }
                }) //
                .to(...InfluxEndpoint...) //
                .onException(Exception.class) //
                .useOriginalMessage() //
                .handled(true) //
                .to("stream:out");
        }
    }
}

這適用於我的目的 - 沒有匯合,只有卡夫卡。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM