简体   繁体   中英

C# Parallelizing CSV Parsing

Please look at the code below.

void func()
{
    for(;;)
    {
        var item = new Item();
    }
}

Item is a class in whose constructor I read several csv files, as follows

List<string> data = new List<string>();
Item()
{
    //read from csv into List<string>data
}

As is visible, the csv files are distinct and are read into unique variables. I would like to be able to parallelize this. All my data is on a network drive. I understand that the limitation in this case is the disk access. Can someone suggest what I can do to parallelize this?

if all your files are unique and stored in unique variables, take a look on Parallel.ForEach statement - take a look at http://msdn.microsoft.com/en-us/library/dd460720.aspx

As stated before Parallel.ForEach is the easiest way to run something in parallel, but if I recall correctly parallel.ForEach is a .net 4 method. so if you are using a different version you will have to find another method that uses locks.

If you are looking to read in data from a csv, ADO.net has a built in function that can read in csv files based on a schema file, it's one of the fastest ways in my experience to read in csv files.

quick link I found from google http://www.daniweb.com/web-development/aspnet/threads/38676

I've also had great success with this http://www.codeproject.com/KB/database/CsvReader.aspx . It's a little slower than the ado.net version but it's easier to use and you don't need schema file.

just a warning, if you use the ado.net and you large string numeric values like credit cards numbers and you are getting things that look like scientific notation, your schema file needs to be adjusted, I've had a lot of coders complain about this.

happy coding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM