简体   繁体   中英

Parallel csv processing with LINQ

I have a function that reads a csv file and returns manipulated (by LINQ) results. I need to open each file twice, as I need to slice the data very differently for different uses and with the "A Fast CSV Reader" on codeproject that I m using it is faster to read it twice and do manipulation with LINQ directly each time than reading it into DataTable.

Individually each function call (imppow or impfuel) takes just over 2 secs.

Simple for loop for six calls (takes 13secs):

string[] pathstring = { @"C:\Temp\Hourly1.txt", @"C:\Temp\Hourly2.txt", @"C:\Temp\Hourly3.txt" };
string[] pathgran = { "M", "Q", "Y" };
for (int i=0; i < 3; i++)
{
     var respow = imppow(pathstring[i], pathgran[i]);
     Console.WriteLine(respow[0]);

     var resfuel = impfuel(pathstring[i], pathgran[i]);
     Console.WriteLine(resfuel[0]);
}

Parallelizing like this shaves off 3 secs but not more:

Parallel.For(0, 3, (i) =>
{
    var respow = imppow(pathstring[i], pathgran[i]);
    Console.WriteLine(respow[0]);

    var resfuel = impfuel(pathstring[i], pathgran[i]);
    Console.WriteLine(resfuel[0]);
});

As said one call takes roughly 2 secs. Can I get the runtim down further by using multithreading or sth? Thanks.

Below one of the functions:

static object[] impfuel(string filepath, string gran)
{  
    using (CsvReader csv =
           new CsvReader(new StreamReader(filepath), true))
    {
        csv.SupportsMultiline = false;
        var results = csv.Select(r => new { yr = r[1], qr = r[3], mt = r[4], tar = r[7], mac = r[8], fuel = r[9], rg = r[10], rt = r[11], fp = r[22], fi = r[24] })
                         .Where(a => a.rt == "F")
                         .GroupBy(a => new { a.rg, a.fuel, a.tar, a.mt })
                         .Select(g => new { Rpg = g.Select(a => a.rg).First(), Fue = g.Select(a => a.fuel).First(), Tari = g.Select(a => a.tar).First(), Mon = g.Select(a => a.mt).First(), AverageA = g.Average(a => double.Parse(a.fp)), SumA = g.Sum(a => double.Parse(a.fi)) })
                         .ToArray();
        return results;
    }
}

static object[] imppow(string filepath, string gran)
{  
using (CsvReader csv =
       new CsvReader(new StreamReader(filepath), true))
{
    csv.SupportsMultiline = false;
    var results = csv.Select(r => new { yr = r[1], qr = r[3], mt = r[4], tar = r[7], mac = r[8], rg = r[10], rt = r[11], pp = r[17], pi = r[19] })
                     .Where(a => a.rt == "M")
                     .GroupBy(a => new { a.rg, a.tar, a.mt })
                     .Select(g => new { Rpg = g.Select(a => a.rg).First(), Tari = g.Select(a => a.tar).First(), Mon = g.Select(a => a.mt).First(), AverageA = g.Average(a => double.Parse(a.pp)), SumA = g.Sum(a => double.Parse(a.pi)) })
                     .ToArray();
    return results;
}

}

You never tell the size of the files, is it a few kilobytes or at we talking megabytes? Reading the file once would limit the slow IO.

I would read the file once and while reading it I would put the content into two different lists.

string[] pathstring = { @"C:\Temp\Hourly1.txt", @"C:\Temp\Hourly2.txt", @"C:\Temp\Hourly3.txt" };
for (int i=0; i < 3; i++)
{
     List<Content> powList = new List<Content>();
     List<Content> fuelList = new List<Content>();
     ReadFile(pathstring[i], ref powList, ref fuelList);
     var respow = imppow(powList);
     Console.WriteLine(respow[0]);

     var resfuel = impfuel(fuelList);
     Console.WriteLine(resfuel[0]);
}

void ReadFile(string filepath, ref List<Content> powList, ref List<Content> fuelList)
{
    using (CsvReader csv = new CsvReader(new StreamReader(filepath), true))
    {
        csv.SupportsMultiline = false;
        foreach(Content content in csv.Select(r => new Content(){ yr = r[1], qr = r[3], mt = r[4], tar = r[7], mac = r[8], fuel = r[9], rg = r[10], rt = r[11], pp = r[17], pi = r[19], fp = r[22], fi = r[24] }))
        {
           if (content.rt == "F")
               fuelList.Add(content);
           else if (content.rt = "M")
               powList.Add(content);
        }
    }
}

static object[] impfuel(List<Content> fuelList)
{  
    var results = fuelList.GroupBy(a => new { a.rg, a.fuel, a.tar, a.mt })
                     .Select(g => new { Rpg = g.Select(a => a.rg).First(), Fue = g.Select(a => a.fuel).First(), Tari = g.Select(a => a.tar).First(), Mon = g.Select(a => a.mt).First(), AverageA = g.Average(a => double.Parse(a.fp)), SumA = g.Sum(a => double.Parse(a.fi)) })
                     .ToArray();
    return results;
}

}

You can make imppow and the Content class your self.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM