简体   繁体   中英

How to split a string into a array of arrays of strings as fast as possible in C#?

Short description: Splitting a string takes way to long.

Longer description: I need to extract information from a string looking like this:

...
5   1   12  1   1   1   466 1277    458 80  92  Assistance
2   1   13  0   0   0   1055    1277    1717    100 -1  
3   1   13  1   0   0   1055    1186    1717    191 -1  
4   1   13  1   1   0   1055    1277    1717    100 -1  
5   1   13  1   1   1   1055    1279    288 78  90  Vehicle
5   1   13  1   1   2   1489    1279    228 98  67  Lights
5   1   13  1   1   3   1856    1281    286 95  74  System
5   1   13  1   1   4   2284    1281    196 95  70  Apps
5   1   13  1   1   5   2618    1277    154 80  77  Info
...

(Side Note: the string comes as a return from the page.GetTsvText(0) method; page is a return of TesseractEngine.Process(image) ; so the string contains information about detected OCR strings , conficendes , bounding boxes coords , etc.)

In order to be able to make easier use of the information, I wrote a method that turns the string into a array of arrays of strings :

private string[][] getDataArray(string source)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();

            Console.WriteLine(source);

            string[] rows = source.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
            int nrOfRows = rows.Length;
            string[][] result = new string[nrOfRows][];

            for (int i = 0; i < nrOfRows; i++)
            {
                result[i] = rows[i].Split(new char[] { ' ', '   ' }, StringSplitOptions.RemoveEmptyEntries);
            }
            sw.Stop();
            Console.WriteLine(" $$$ getDataArray() took: " + sw.ElapsedMilliseconds + " ms");
            return result;
        }

Note: For some reason the string contains spaces that look longer than the usual spaces. I took it with copy-paste from the console log. It is a single character, not a tab, but it takes more space/ is wider than the usual space char.

Problem:

  • When I measure the time from inside the method, it takes less than 1 ms .
  • When I measure the time from outside , like this:
stopwatch.Restart();


// Get data
string[][] data = getDataArray(page.GetTsvText(0));

stopwatch.Stop();
Console.WriteLine(" $$$ $$$ Got data array in: " + stopwatch.ElapsedMilliseconds + " ms");

it takes about 2000 ms .

Does the string initialisation take so long ? How can I get it faster, like under 50 ms ?

By using Linq

string[][] result = source.Split('\n')
                          .Select(line => line.Split(new char[] { ' ', '   ' }, StringSplitOptions.RemoveEmptyEntries));

Linq has better performance and a faster result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM