简体   繁体   中英

Can I deserialize vectors with variable length prefix with Bincode?

I am having a problem with the Rust bincode library. When it serializes a vector, it always assumes the prefixed length is 8 bytes. This is a fine assumption when you always encode data using bincode because bincode can read it's own serialized data.

I am in the situation where I cannot influence the serializer as I did not write it and it has to stay the same for legacy reasons. It encodes its vectors as a length-prefixed array where the prefix is always 2 bytes (or in some cases it is 4 bytes but but I know these cases well. Once I know how to do it with 2 bytes 4 bytes should not be a problem).

How can I use bincode (and serde for that matter) to deserialize these fields? Can I work around the default 8 bytes of length hardcoded in bincode?

Bincode is not supposed to be compatible with any existing serializer or standard. Nor is, according to the comment, the format you are trying to read.

I suggest you get the bincode sources—they are MIT-licensed, so you are free to do basically whatever you please with them—and modify them to suit your format (and give it your name and include it in your project).

serde::Deserializer is quite well documented, as is the underlying data model , and the implementation in bincode is trivial to find (in de/mod.rs ), so take it as your starting point and adjust as needed.

I have figured out a (possibly very ugly) way to do it without implementing my own deserializer — Bincode could do it after all. It looks something like this:

impl<'de> Deserialize<'de> for VarLen16 {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct VarLen16Visitor;
        impl<'de> Visitor<'de> for VarLen16Visitor {
            type Value = VarLen16;
            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("VarLen16")
            }

            fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
            where
                A: SeqAccess<'de>,
            {
                let mut res: Vec<u8> = vec![];

                let length: u16 = seq
                    .next_element()?
                    .ok_or_else(|| serde::de::Error::invalid_length(1, &self))?;

                for i in 0..length {
                    res.push(
                        seq.next_element()?
                            .ok_or_else(|| serde::de::Error::invalid_length(1, &self))?,
                    );
                }

                return Ok(VarLen16(res));
            }
        }

        return Ok(deserializer.deserialize_tuple(1 << 16, VarLen16Visitor)?);
    }
}

In short, I make the system think I deserialize a tuple where I set the length to the maximum I need. I have tested this, it does not actually allocate that much memory. Then I act like the length is part of this tuple, read it first and then continue reading as far as this length tells me to. It's not pretty but it certainly works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM