Kafka字符串序列化效率

Question

我是Kafka的新手，正在嘗試以最少的內存開銷存儲消息，因此想避免在編碼中使用字段名（即JSON）。 考慮一條帶有三個可變長度 String字段的消息，

Interface IMessage:
   String getA()
   String getB()
   String getC()

由於Kafka包含默認的String Serializer ，最簡單的編碼方法是簡單地連接和分隔字段。 就像是，

String encoded = "FieldA|FieldB|FieldC"

在后台，Kafka會將其轉換為字節數組。

我的問題是，kafka是否將使用Java的默認UTF-8編碼，以便字符串中的每個ASCII字符僅占用一個字節？ 換句話說，15個字符串在卡夫卡的內存中會占用15個字節嗎？ 還是出於某種原因在Java中調用toBytes()並將字節數組直接傳遞到ByteArraySerializer效率更高？

byte[] encoded = "FieldA|FieldB|FieldC".toBytes()

Answer 1

此類文檔說明

字符串編碼默認為UTF8，可以通過設置屬性key.serializer.encoding，value.serializer.encoding或serializer.encoding進行自定義。 前兩個優先於最后一個。

因此，根據需要，默認編碼為UTF-8。

另外，您可以下載源代碼並找到：

private String encoding = "UTF8";

@Override
public void configure(Map<String, ?> configs, boolean isKey) {
    String propertyName = isKey ? "key.serializer.encoding" : "value.serializer.encoding";
    Object encodingValue = configs.get(propertyName);
    if (encodingValue == null)
        encodingValue = configs.get("serializer.encoding");
    if (encodingValue != null && encodingValue instanceof String)
        encoding = (String) encodingValue;
}

因此，源匹配文檔，這很好。

如果要確保可以將key.serializer.encoding和value.serializer.encoding定義為UTF8

Kafka字符串序列化效率

問題描述

1 個解決方案

解決方案1
2 已采納 2017-04-13 20:11:25

Kafka字符串序列化效率

問題描述

1 個解決方案

解決方案1 2 已采納 2017-04-13 20:11:25

解決方案1
2 已采納 2017-04-13 20:11:25