简体   繁体   中英

AvroSerializer: schema for orderbook snapshots

I have a Kafka cluster running and I want to store L2-orderbook snapshots into a topic that have a dictionary of {key:value} pairs where the keys are of type float as the following example:

{
    'exchange': 'ex1',
    'symbol': 'sym1',
    'book': {
        'bid': {
            100.0: 20.0,
            101.0: 21.3,
            102.0: 34.6,
            ...,
        },
        'ask': {
            100.0: 20.0,
            101.0: 21.3,
            102.0: 34.6,
            ...,
        }
    },
    'timestamp': 1642524222.1160505
}

My schema proposal below is not working and I'm pretty sure it is because the keys in the 'bid' and 'ask' dictionaries are not of type string.

{
    "namespace": "confluent.io.examples.serialization.avro",
    "name": "L2_Book",
    "type": "record",
    "fields": [
        {"name": "exchange", "type": "string"},
        {"name": "symbol", "type": "string"},
        {"name": "book", "type": "record", "fields": {
            "name": "bid", "type": "record", "fields": {
                {"name": "price", "type": "float"},
                {"name": "volume", "type": "float"}
            },
            "name": "ask", "type": "record", "fields": {
                {"name": "price", "type": "float"},
                {"name": "volume", "type": "float"}
            }
        },
        {"name": "timestamp", "type": "float"}
    ]
}

KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="no value and no default for bids"}

What would be a proper avro-schema here?

First, you have a typo. fields needs to be an array in the schema definition.

However, your bid (and ask) objects are not records. They are a map<float, float> . In other words, it does not have literal price and volume keys.

Avro has Map types , but the keys are "assumed to be strings".

You are welcome to try

{"name": "bid", "type": "map", "values": "float"}

Otherwise, you need to reformat your data payloads, for example as a list of objects

'bid': [
     {'price': 100.0, 'volume': 20.0},
     ...,
],

Along with

{"name": "bid", "type": "array", "items": {
  "type": "record",
  "name": "BidItem",
  "fields": [
    {"name": "price", "type": "float"},
    {"name": "volume", "type": "float"}
  ]
}}

I have finally figured out 2 working resolutions. In both cases I need to convert the original data.

The main lessons for me have been:

  1. avro maps need keys of type string
  2. avro complex types (eg maps and records) need to be defined properly:
{"name": "bid", "type"
      {"type": "array", "items": {
          ...

Special thanks to OneCricketeer for pointing me into the right direction: :-)

1) bids and asks as a map with the key being of type string

data example

{
    'exchange': 'ex1',
    'symbol': 'sym1',
    'book': {
        'bid': {
            "100.0": 20.0,
            "101.0": 21.3,
            "102.0": 34.6,
            ...,
        },
        'ask': {
            "100.0": 20.0,
            "101.0": 21.3,
            "102.0": 34.6,
            ...,
        }
    },
    'timestamp': 1642524222.1160505
}

schema

{
    "namespace": "confluent.io.examples.serialization.avro",
    "name": "L2_Book",
    "type": "record",
    "fields": [
        {"name": "exchange", "type": "string"},
        {"name": "symbol", "type": "string"},
        {"name": "book", "type": {
            "name": "book",
            "type": "record",
            "fields": [
                {"name": "bid", "type": {
                    "type": "map", "values": "float"
                    }
                }, 
                {"name": "ask", "type": {
                    "type": "map", "values": "float"
                    }
                }
            ]}
        },
        {"name": "timestamp", "type": "float"}
    ]
}

2) bids and asks as an array of records

data example

{
    'exchange': 'ex1',
    'symbol': 'sym1',
    'book': {
        'bid': [
            {"price": 100.0, "volume": 20.0,}
            {"price": 101.0, "volume": 21.3,}
            {"price": 102.0, "volume": 34.6,}
            ...,
        ],
        'ask': [
            {"price": 100.0, "volume": 20.0,}
            {"price": 101.0, "volume": 21.3,}
            {"price": 102.0, "volume": 34.6,}
            ...,
        ]
    },
    'timestamp': 1642524222.1160505
}

schema

{
    "namespace": "confluent.io.examples.serialization.avro",
    "name": "L2_Book",
    "type": "record",
    "fields": [
        {"name": "exchange", "type": "string"},
        {"name": "symbol", "type": "string"},
        {"name": "book", "type": {
            "name": "book",
            "type": "record", 
            "fields": [
                {"name": "bid", "type": {
                    "type": "array", "items": {
                        "name": "bid",
                        "type": "record",
                        "fields": [
                            {"name": "price", "type": "float"},
                            {"name": "volume", "type": "float"}
                        ]
                    }
                }},
                {"name": "ask", "type": {
                    "type": "array", "items": {
                        "name": "ask",
                        "type": "record",
                        "fields": [
                            {"name": "price", "type": "float"},
                            {"name": "volume", "type": "float"}
                        ]
                    }
                }}
            ]}},
        {"name": "timestamp", "type": "float"}
    ]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM