Short description: Splitting a string takes way to long.
Longer description: I need to extract information from a string looking like this:
...
5 1 12 1 1 1 466 1277 458 80 92 Assistance
2 1 13 0 0 0 1055 1277 1717 100 -1
3 1 13 1 0 0 1055 1186 1717 191 -1
4 1 13 1 1 0 1055 1277 1717 100 -1
5 1 13 1 1 1 1055 1279 288 78 90 Vehicle
5 1 13 1 1 2 1489 1279 228 98 67 Lights
5 1 13 1 1 3 1856 1281 286 95 74 System
5 1 13 1 1 4 2284 1281 196 95 70 Apps
5 1 13 1 1 5 2618 1277 154 80 77 Info
...
(Side Note: the string comes as a return from the page.GetTsvText(0) method; page is a return of TesseractEngine.Process(image) ; so the string contains information about detected OCR strings , conficendes , bounding boxes coords , etc.)
In order to be able to make easier use of the information, I wrote a method that turns the string into a array of arrays of strings :
private string[][] getDataArray(string source)
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine(source);
string[] rows = source.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
int nrOfRows = rows.Length;
string[][] result = new string[nrOfRows][];
for (int i = 0; i < nrOfRows; i++)
{
result[i] = rows[i].Split(new char[] { ' ', ' ' }, StringSplitOptions.RemoveEmptyEntries);
}
sw.Stop();
Console.WriteLine(" $$$ getDataArray() took: " + sw.ElapsedMilliseconds + " ms");
return result;
}
Note: For some reason the string contains spaces that look longer than the usual spaces. I took it with copy-paste from the console log. It is a single character, not a tab, but it takes more space/ is wider than the usual space char.
Problem:
stopwatch.Restart();
// Get data
string[][] data = getDataArray(page.GetTsvText(0));
stopwatch.Stop();
Console.WriteLine(" $$$ $$$ Got data array in: " + stopwatch.ElapsedMilliseconds + " ms");
it takes about 2000 ms .
Does the string initialisation take so long ? How can I get it faster, like under 50 ms ?
By using Linq
string[][] result = source.Split('\n')
.Select(line => line.Split(new char[] { ' ', ' ' }, StringSplitOptions.RemoveEmptyEntries));
Linq has better performance and a faster result.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.