简体   繁体   中英

using StreamReader.Read to read blocks of short, integer and decimal data types

the values are comma separeted so I am using a stringbuilder to build up the values. then write them to the appropriate buffer. I noticed a considerable time spent in the builder.ToString and the Parse functions. Do I have to write unsafe code to overcome this problem? and what's the best way to acheive what I want

   private static void ReadSecondBySecondFileToEndBytes(FileInfo file, SafeDictionary<short, SafeDictionary<string, SafeDictionary<int, decimal>>> dayData)
    {
        string name = file.Name.Split('.')[0];
        int result = 0;
        int index = result;
        int length = 1*1024; //1 kb
        char[] buffer = new char[length];

        StringBuilder builder = new StringBuilder();
        bool pendingTick = true;
        bool pendingSymbol = true;
        bool pendingValue = false;
        string characterString = string.Empty;
        short symbol = 0;
        int tick = 0;
        decimal value;
        using (StreamReader streamReader = (new StreamReader(file.FullName)))
        {

            while ((result = streamReader.Read(buffer, 0, length)) > 0)
            {
                int i = 0;
                while (i < result)
                {
                    if (buffer[i] == '\r' || buffer[i] == '\n')
                    {                           
                        pendingTick = true;
                        if (pendingValue)
                        {
                            value = decimal.Parse(builder.ToString());
                            pendingSymbol = true;
                            pendingValue = false;
                            dayData[symbol][name][tick] = value;
                            builder.Clear();
                        }
                    }
                    else if (buffer[i] == ',') // new value to capture
                    {                          
                        if (pendingTick)
                        {
                            tick = int.Parse(builder.ToString());
                            pendingTick = false;
                        }
                        else if (pendingSymbol)
                        {
                            symbol = short.Parse(builder.ToString());
                            pendingValue = true;
                            pendingSymbol = false;
                        }
                        else if (pendingValue)
                        {
                            value = decimal.Parse(builder.ToString());
                            pendingSymbol = true;
                            pendingValue = false;
                            dayData[symbol][name][tick] = value;
                        }
                        builder.Clear();
                    }
                    else
                        builder.Append(buffer[i]);
                    i++;
                }

            }
        }

    }

My suggestion would be to not try to parse the majority of the file as you are doing now, but go for something like this:

using (var reader = File.OpenText("<< filename >>"))
{
    string line;

    while ((line = reader.ReadLine()) != null)
    {
        string[] parts = line.Split(',');

        // Process the different parts of the line here.
    }
}

The main difference here is that you are not parsing line ends and separation on comma's. The advantage being that when you use high level methods like ReadLine() , the StreamReader (which File.OpenText() returns) can optimize for reading the file line by line. The same goes for String.Split() .

Using these high level methods will almost always be faster then when you parse the buffer yourself.

With the approach above, you don't have to use the StringBuilder anymore and can just get your values like this:

tick = int.Parse(parts[0]);
symbol = short.Parse(parts[1]);
value = decimal.Parse(parts[2]);
dayData[symbol][name][tick] = value;

I have not verified the above snippet; please verify that these lines are correct, or correct them for your business logic.

You got the wrong impression. Yes, while you are testing your program, you'll indeed see most time being spent inside the Parse() and builder. Because that is the only code that does any real work.

But that's not going to be this way in production. Then all the time will be spent in the StreamReader. Because the file won't be present in the file system cache like it is when you run your program over and over again on your dev machine. In production, the file has to be read off a disk drive. And that's glacially slow, disk I/O is the true bottleneck of your program. Making the parsing twice as fast will only make your program a few percent faster, if at all.

Don't compromise the reliability or maintainability of your code for such a small gain.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM