簡體   English   中英

Hortonworks Schema Registry + Nifi + Java:反序列化 Nifi 記錄

[英]Hortonworks Schema Registry + Nifi + Java: Deserialize Nifi Record

我正在嘗試使用 Hortonworks Schema Registry 反序列化一些由 Nifi 序列化的 Kafka 消息

  • 在 Nifi 端用作 RecordWritter 的處理器:AvroRecordSetWriter
  • 模式寫入策略:HWX 內容編碼模式參考

我能夠在其他 Nifi kafka 消費者中反序列化這些消息。 但是,我正在嘗試使用 Kafka 代碼從我的 Flink 應用程序中反序列化它們。

我的 Flink 應用程序的 Kafka 解串器處理程序中有以下內容:

final String SCHEMA_REGISTRY_CACHE_SIZE_KEY = SchemaRegistryClient.Configuration.CLASSLOADER_CACHE_SIZE.name();
final String SCHEMA_REGISTRY_CACHE_EXPIRY_INTERVAL_SECS_KEY = SchemaRegistryClient.Configuration.CLASSLOADER_CACHE_EXPIRY_INTERVAL_SECS.name();
final String SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_SIZE_KEY = SchemaRegistryClient.Configuration.SCHEMA_VERSION_CACHE_SIZE.name();
final String SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS_KEY = SchemaRegistryClient.Configuration.SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS.name();
final String SCHEMA_REGISTRY_URL_KEY = SchemaRegistryClient.Configuration.SCHEMA_REGISTRY_URL.name();

Properties schemaRegistryProperties = new Properties();
schemaRegistryProperties.put(SCHEMA_REGISTRY_CACHE_SIZE_KEY, 10L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_CACHE_EXPIRY_INTERVAL_SECS_KEY, 5000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_SIZE_KEY, 1000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_SCHEMA_VERSION_CACHE_EXPIRY_INTERVAL_SECS_KEY, 60 * 60 * 1000L);
schemaRegistryProperties.put(SCHEMA_REGISTRY_URL_KEY, "http://schema_registry_server:7788/api/v1");
return (Map<String, Object>) HWXSchemaRegistry.getInstance(schemaRegistryProperties).deserialize(message);

這是用於反序列化消息的 HWXSchemaRegistryCode:

import com.hortonworks.registries.schemaregistry.avro.AvroSchemaProvider;
import com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient;
import com.hortonworks.registries.schemaregistry.errors.SchemaNotFoundException;
import com.hortonworks.registries.schemaregistry.serdes.avro.AvroSnapshotDeserializer;

public class HWXSchemaRegistry {

    private SchemaRegistryClient client;
    private Map<String,Object> config;
    private AvroSnapshotDeserializer deserializer;
    private static HWXSchemaRegistry hwxSRInstance = null;

    public static HWXSchemaRegistry getInstance(Properties schemaRegistryConfig) {
        if(hwxSRInstance == null)
            hwxSRInstance = new HWXSchemaRegistry(schemaRegistryConfig);
        return hwxSRInstance;
    }

    public Object deserialize(byte[] message) throws IOException {

        Object o = hwxSRInstance.deserializer.deserialize(new ByteArrayInputStream(message), null);
        return o;
   }

    private static Map<String,Object> properties2Map(Properties config) {
        Enumeration<Object> keys = config.keys();
        Map<String, Object> configMap = new HashMap<String,Object>();
        while (keys.hasMoreElements()) {
            Object key = (Object) keys.nextElement();
            configMap.put(key.toString(), config.get(key));
        }
        return configMap;
     }

    private HWXSchemaRegistry(Properties schemaRegistryConfig) {
        _log.debug("Init SchemaRegistry Client");
        this.config = HWXSchemaRegistry.properties2Map(schemaRegistryConfig);
        this.client = new SchemaRegistryClient(this.config);

        this.deserializer = this.client.getDefaultDeserializer(AvroSchemaProvider.TYPE);
        this.deserializer.init(this.config);
     }
}

但我收到 404 HTTP 錯誤代碼(未找到架構)。 我認為這是由於 Nifi 配置和 HWX 架構注冊表客戶端實現之間的“協議”不兼容,因此客戶端正在尋找的架構標識符字節在服務器上不存在,或者類似的東西。

有人可以幫忙嗎?

謝謝你。

引起:javax.ws.rs.NotFoundException: HTTP 404 Not Found at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1069) at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java :866) 在 org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$1(JerseyInvocation.java:750) 在 org.glassfish.jersey.internal.Errors.process(Errors.java:292) 在 org.glassfish.jersey .internal.Errors.process(Errors.java:274) at org.glassfish.jersey.internal.Errors.process(Errors.java:205) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java) :390) 在 org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:748) 在 org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:404) 在 org.glassfish.jersey。 client.JerseyInvocation$Builder.get(JerseyInvocation.java:300) 在 com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient$14.run(SchemaRegistryClient.java:1054) 在 com.hortonw orks.registries.schemaregistry.client.SchemaRegistryClient$14.run(SchemaRegistryClient.java:1051) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at com .hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getEntities(SchemaRegistryClient.java:1051) 在 com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient.getAllVersions(SchemaRegistryClient.java:872) 在 com.hortonworks.registries.clientschemaregistry .SchemaRegistryClient.getAllVersions(SchemaRegistryClient.java:676) at HWXSchemaRegistry.(HWXSchemaRegistry.java:56) at HWXSchemaRegistry.getInstance(HWXSchemaRegistry.java:26) at SchemaService.deserialize(SchemaService.java:70) at SchemaService.deserialize(SchemaService.deserialize) java:26) 在 org.apache.flink.streaming.connectors.kafka.internals.KafkaDeserializationSchemaWrapper.deserialize(KafkaDeserializationSchemaWrapper.java:45) 在 org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.run FetchLoop(KafkaFetcher.java:140) 在 org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:712) 在 org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource. java:93) 在 org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57) 在 org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)在 org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:302) 在 org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) 在 java.lang.Thread .run(Thread.java:745)

我找到了一個解決方法。 因為我無法讓這個工作。 我使用字節數組的第一個字節對模式注冊表進行多次調用,然后讓 avro 模式反序列化字節數組的其余部分。

  • 第一個字節(0)是協議版本(我發現這是一個特定於 Nifi 的字節,因為我不需要它)。
  • 接下來的 8 個字節是架構 ID
  • 接下來的 4 個字節是架構版本
  • 其余字節是消息本身:

     import com.hortonworks.registries.schemaregistry.SchemaMetadataInfo; import com.hortonworks.registries.schemaregistry.SchemaVersionInfo; import com.hortonworks.registries.schemaregistry.SchemaVersionKey; import com.hortonworks.registries.schemaregistry.client.SchemaRegistryClient; try(SchemaRegistryClient client = new SchemaRegistryClient(this.schemaRegistryConfig)) { try { Long schemaId = ByteBuffer.wrap(Arrays.copyOfRange(message, 1, 9)).getLong(); Integer schemaVersion = ByteBuffer.wrap(Arrays.copyOfRange(message, 9, 13)).getInt(); SchemaMetadataInfo schemaInfo = client.getSchemaMetadataInfo(schemaId); String schemaName = schemaInfo.getSchemaMetadata().getName(); SchemaVersionInfo schemaVersionInfo = client.getSchemaVersionInfo( new SchemaVersionKey(schemaName, schemaVersion)); String avroSchema = schemaVersionInfo.getSchemaText(); byte[] message= Arrays.copyOfRange(message, 13, message.length); // Deserialize [...] } catch (Exception e) { throw new IOException(e.getMessage()); } }

我還想也許我必須在我的問題代碼中調用hwxSRInstance.deserializer.deserialize之前刪除第一個字節,因為這個字節似乎是 Nifi 處理器之間通信的 Nifi 特定字節,但它不起作用。

下一步是使用模式文本構建緩存,以避免多次調用模式注冊表 API。

新信息:我將擴展我的答案以包括 avro 反序列化部分,因為這對我來說是一些故障排除,我不得不檢查 Nifi Avro Reader 源代碼來找出這部分(我在嘗試使用時遇到無效的 Avro 數據異常基本的 avro 反序列化代碼):

import org.apache.avro.Schema;
import org.apache.avro.file.SeekableByteArrayInput;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.BinaryDecoder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DecoderFactory;

private static GenericRecord deserializeMessage(byte[] message, String schemaText) throws IOException {

    InputStream in = new SeekableByteArrayInput(message);
    Schema schema = new Schema.Parser().parse(schemaText);
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
    BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(in,  null);
    GenericRecord genericRecord = null;
    genericRecord = datumReader.read(genericRecord, decoder);
    in.close();

    return genericRecord;
}

如果要將 GenericRecord 轉換為 map,注意 String 值不是 Strings 對象,需要對 string 類型的 Keys 和 values 進行強制轉換:

private static Map<String, Object> avroGenericRecordToMap(GenericRecord record)
{
    Map<String, Object> map = new HashMap<>();
    record.getSchema().getFields().forEach(field -> 
        map.put(String.valueOf(field.name()), record.get(field.name())));

    // Strings are maped to Utf8 class, so they need to be casted (all the keys of records and those values which are typed as string)
    if(map.get("value").getClass() ==  org.apache.avro.util.Utf8.class)
        map.put("value", String.valueOf(map.get("value")));

    return map;
}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM