简体   繁体   中英

Why is protobuf-net deserializer so much slower in my code than streamreading csv

I store simple time series in the following format and look for the fastest way to read and parse them to "quote" objects:

DateTime, price1, price2 . . . DateTime is in the following string format: YYYYmmdd HH:mm:ss:fff price1 and price 2 are strings of numbers with 5 decimal places (1.40505, ie)

I played with different ways to store and read the data and also toyed around with the protobuf-net library. A file that was serialized and contained roughly 6 million rows (raw csv serialized in the following way:

TimeSeries object, holding a List<Blobs> , Blob object holding a Header object and List<Quotes> (one blob contains quotes for one single day) Quote object holding DateTime, double px1, and double px2

It took about 47 seconds to read (from disk) the serialized binary and deserialize it which seemed awefully long. In contrast I kept the time series in csv string format, read each row into a List and then parsed each row to DateTime dt, double px1, double px1 which I stuck into a newly created Quote object and added those to a List. This took about 10 seconds to read (12 seconds with GZip compression -> making the file about 1/9th of the size.)

At first sight it looks like I either handle protobuf-net functionality incorrectly or else that this particular kind of time series does not lend itself well to serialization/deserialization.

Any comments or help, especially Marc, if you read this, could you possibly chime in and add some of your thoughts? I find it hard to imagine that I end up with such different performance numbers.

Some information: I do not need to random access the data. I only need to read full days, thus storing one day's worth of data in an individual csv file made sense for my purpose, I thought.

Any ideas what may be the fastest way to read such kind of data? I apologize for the simplistic language, I am not a programmer by heart.

Here is a sample object I use for protobuf-net:

[ProtoContract]
class TimeSeries
{
    [ProtoMember(1)]
    public Header Header { get; set; }
    [ProtoMember(2)]
    public List<DataBlob> DataBlobs { get; set; }
}

[ProtoContract]
class DataBlob
{
    [ProtoMember(1)]
    public Header Header { get; set; }
    [ProtoMember(2)]
    public List<Quote> Quotes { get; set; }
}

[ProtoContract]
class Header
{
    [ProtoMember(1)]
    public string SymbolID { get; set; }
    [ProtoMember(2)]
    public DateTime StartDateTime { get; set; }
    [ProtoMember(3)]
    public DateTime EndDateTime { get; set; }
}

[ProtoContract]
class Quote
{
    [ProtoMember(1)]
    public DateTime DateTime { get; set; }
    [ProtoMember(2)]
    public double BidPrice { get; set; }
    [ProtoMember(3)]
    public long AskPrice { get; set; } //Expressed as Spread to BidPrice
}

Here is the code used to serialize/deserialize:

public static void SerializeAll(string fileNameWrite, List<Quote> QuoteList)
    {
        //Header
        Header Header = new Header();
        Header.SymbolID = SymbolID;
        Header.StartDateTime = StartDateTime;
        Header.EndDateTime = EndDateTime;

        //Blob
        List<DataBlob> DataBlobs = new List<DataBlob>();
        DataBlob DataBlob = new DataBlob();
        DataBlob.Header = Header;
        DataBlob.Quotes = QuoteList;
        DataBlobs.Add(DataBlob);

        //Create TimeSeries
        TimeSeries TimeSeries = new TimeSeries();
        TimeSeries.Header = Header;
        TimeSeries.DataBlobs = DataBlobs;

        using (var file = File.Create(fileNameWrite))
        {
            Serializer.Serialize(file, TimeSeries);
        }
    }

public static TimeSeries DeserializeAll(string fileNameBinRead)
    {
        TimeSeries TimeSeries;

        using (var file = File.OpenRead(fileNameBinRead))
        {
            TimeSeries = Serializer.Deserialize<TimeSeries>(file);
        }

        return TimeSeries;
    }

Fastest way is a handcoded binary serializer, especialyl if you transform pices ticks. That is what I do, although my volume is slightly differenet (600 million items per day, around about 200.000 symbols with some being top heavy). I store nothing in a way that needs parsing from text. Parser is handcrafte and i use profiler to ooptimize it - aos handles size very well (a trade is down to 1 byte sometiems).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM