I have billions of objects that I'm trying to structure them in a B+Tree serialized to HDD. I'm using BPlusTree library for the data structure and protobuf-net for serialization/deserialization. In this regard I define my classes as:
[ProtoContract]
public class B<C, M>
where C : IComparable<C>
where M : IData<C>
{
internal B()
{
lambda = new List<Lambda<C, M>>();
omega = 0;
}
internal B(C coordinate)
{
lambda = new List<Lambda<C, M>>();
e = coordinate;
omega = 0;
}
[ProtoMember(1)]
internal C e { set; get; }
[ProtoMember(2)]
internal List<Lambda<C, M>> lambda { private set; get; }
[ProtoMember(3)]
internal int omega { set; get; }
}
[ProtoContract]
public class Lambda<C, M>
where C : IComparable<C>
where M : IData<C>
{
internal Lambda() { }
internal Lambda(char tau, M atI)
{
this.tau = tau;
this.atI = atI;
}
[ProtoMember(1)]
internal char tau { private set; get; }
[ProtoMember(2)]
internal M atI { private set; get; }
}
and I define my serializers/deserializers as following:
public class BSerializer<C, M> : ISerializer<B<C, M>>
where C : IComparable<C>
where M : IData<C>
{
public B<C, M> ReadFrom(System.IO.Stream stream)
{
return Serializer.Deserialize<B<C, M>>(stream);
}
public void WriteTo(B<C, M> value, System.IO.Stream stream)
{
Serializer.Serialize<B<C, M>>(stream, value);
}
}
Then I use them all in a B+Tree ( This library ) data structure which is defined as:
var options = new BPlusTree<C, B<C, M>>.OptionsV2(CSerializer, BSerializer);
var myTree = new BPlusTree<C, B<C, M>>(options);
The B+Tree is defined as a dictionary of key-value pairs. My key
(ie, C
) is an integer and the serializer is the default serializer of BPlusTree
library. My Value
is a custom object B<C,M>
that is serialized using protobuf-net
.
My problem surely happens, but almost at random times; always searching for Keys
, it suddenly starts deserializing the Value
and at the first call of B<C, M> ReadFrom(System.IO.Stream stream)
it asks for TypeModel.CS
and ProtoReader.CS
files. I get both packages from NuGet
.
Checking the code, it looks like the calling code assumes serializations are aware of their own length; from the source:
foreach (T i in items)
_serializer.WriteTo(i, io);
Protobuf messages are not self-terminating - the google protobuf specification defines append===merge. As such, you'll need to prefix messages. Fortunately, you should be able to just switch to SerializeWithLengthPrefix
and DeserializeWithLengthPrefix
. If that doesn't work, it would be worth putting together a fully reproducible example so that it can be investigated.
As an alternative approach to solving this problem, you can also aggregate the behavior of the built-in serailizers:
class BSerializer<C, M> : ISerializer<B<C, M>>
where C : IComparable<C>
where M : IData<C>
{
public B<C, M> ReadFrom(System.IO.Stream stream)
{
byte[] value = CSharpTest.Net.Serialization.PrimitiveSerializer.Bytes.ReadFrom(stream);
return Serializer.Deserialize<B<C, M>>(new MemoryStream(value));
}
public void WriteTo(B<C, M> value, System.IO.Stream stream)
{
using (var memory = new MemoryStream())
{
Serializer.Serialize<B<C, M>>(memory, value);
CSharpTest.Net.Serialization.PrimitiveSerializer.Bytes.WriteTo(memory.ToArray(), stream);
}
}
}
Note: This approach can be a performance problem due to the unnecessary copies of data; however, it can help resolve the issue.
Another possibility is simply defined the tree as BPlusTree<TKey, byte[]>
and provide the PrimitiveSerializer.Bytes
as the value serializer. This places the burden of object serialization on the caller which can be a very good thing. The reason this can be beneficial is two fold:
For other common serialization issues and some examples please read the following article:
http://csharptest.net/1230/bplustree-and-custom-iserializer-implementations/
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.