I have a Kafka cluster running and I want to store L2-orderbook snapshots into a topic that have a dictionary of {key:value} pairs where the keys are of type float as the following example:
{
'exchange': 'ex1',
'symbol': 'sym1',
'book': {
'bid': {
100.0: 20.0,
101.0: 21.3,
102.0: 34.6,
...,
},
'ask': {
100.0: 20.0,
101.0: 21.3,
102.0: 34.6,
...,
}
},
'timestamp': 1642524222.1160505
}
My schema proposal below is not working and I'm pretty sure it is because the keys in the 'bid' and 'ask' dictionaries are not of type string.
{
"namespace": "confluent.io.examples.serialization.avro",
"name": "L2_Book",
"type": "record",
"fields": [
{"name": "exchange", "type": "string"},
{"name": "symbol", "type": "string"},
{"name": "book", "type": "record", "fields": {
"name": "bid", "type": "record", "fields": {
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
},
"name": "ask", "type": "record", "fields": {
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
}
},
{"name": "timestamp", "type": "float"}
]
}
KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="no value and no default for bids"}
What would be a proper avro-schema here?
First, you have a typo. fields
needs to be an array in the schema definition.
However, your bid (and ask) objects are not records. They are a map<float, float>
. In other words, it does not have literal price
and volume
keys.
Avro has Map types , but the keys are "assumed to be strings".
You are welcome to try
{"name": "bid", "type": "map", "values": "float"}
Otherwise, you need to reformat your data payloads, for example as a list of objects
'bid': [
{'price': 100.0, 'volume': 20.0},
...,
],
Along with
{"name": "bid", "type": "array", "items": {
"type": "record",
"name": "BidItem",
"fields": [
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
]
}}
I have finally figured out 2 working resolutions. In both cases I need to convert the original data.
The main lessons for me have been:
{"name": "bid", "type"
{"type": "array", "items": {
...
Special thanks to OneCricketeer for pointing me into the right direction: :-)
1) bids and asks as a map with the key being of type string
data example
{
'exchange': 'ex1',
'symbol': 'sym1',
'book': {
'bid': {
"100.0": 20.0,
"101.0": 21.3,
"102.0": 34.6,
...,
},
'ask': {
"100.0": 20.0,
"101.0": 21.3,
"102.0": 34.6,
...,
}
},
'timestamp': 1642524222.1160505
}
schema
{
"namespace": "confluent.io.examples.serialization.avro",
"name": "L2_Book",
"type": "record",
"fields": [
{"name": "exchange", "type": "string"},
{"name": "symbol", "type": "string"},
{"name": "book", "type": {
"name": "book",
"type": "record",
"fields": [
{"name": "bid", "type": {
"type": "map", "values": "float"
}
},
{"name": "ask", "type": {
"type": "map", "values": "float"
}
}
]}
},
{"name": "timestamp", "type": "float"}
]
}
2) bids and asks as an array of records
data example
{
'exchange': 'ex1',
'symbol': 'sym1',
'book': {
'bid': [
{"price": 100.0, "volume": 20.0,}
{"price": 101.0, "volume": 21.3,}
{"price": 102.0, "volume": 34.6,}
...,
],
'ask': [
{"price": 100.0, "volume": 20.0,}
{"price": 101.0, "volume": 21.3,}
{"price": 102.0, "volume": 34.6,}
...,
]
},
'timestamp': 1642524222.1160505
}
schema
{
"namespace": "confluent.io.examples.serialization.avro",
"name": "L2_Book",
"type": "record",
"fields": [
{"name": "exchange", "type": "string"},
{"name": "symbol", "type": "string"},
{"name": "book", "type": {
"name": "book",
"type": "record",
"fields": [
{"name": "bid", "type": {
"type": "array", "items": {
"name": "bid",
"type": "record",
"fields": [
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
]
}
}},
{"name": "ask", "type": {
"type": "array", "items": {
"name": "ask",
"type": "record",
"fields": [
{"name": "price", "type": "float"},
{"name": "volume", "type": "float"}
]
}
}}
]}},
{"name": "timestamp", "type": "float"}
]
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.