简体   繁体   中英

Split multiple csv files by value from one csv file with c#

I need to open a csv file. Than I need filter each data and generate an output for each value of them.

◘ Example

•Input file = "full list.csv"

NAME        CITY
Mark        Venezia
John        New York
Lisa        San Miguel
Emily       New York
Amelia      New York
Nicolas     Venezia
Bill        San Miguel
Steve       Venezia

Output will be =

• file1 = "full list_Venezia.csv"

NAME        CITY
Mark        Venezia
Nicolas     Venezia
Steve       Venezia

• file2 = "full list_New York.csv"

NAME        CITY
John        New York
Emily       New York
Amelia      New York

• file3 = "full list_San Miguel"

NAME        CITY
Lisa        San Miguel
Bill        San Miguel

I'm using c# with ConsoleApplication on Visual Studio and I started to read the input file in this method:

string inputFile = "full list.csv";
string outputFile;
string line;
string titles = File.ReadLines(inputFile).First();
System.IO.StreamReader file = new System.IO.StreamReader(inputFile);
while ((line = file.ReadLine()) != null)
{
}
file.Close();

System.IO.StreamWriter fileOut = new System.IO.StreamWriter(outputFile);
foreach (DatiOutput objOut in listOutput)
{
}
fileOut.Close();

Is there an algorithm that allows me to filter the data I need?

You have written most of the good parts yourself, and now you need to fill the blanks. Breaking down the steps

  • Read the CSV to a Collection
  • Group Collection based on City
  • Write the each group to separate file

The first step is of course is to read the input file

var listOutput = new List<DatiOutput>();
while ((line = file.ReadLine()) != null)
{
    var data = line.Split(new []{";"},StringSplitOptions.RemoveEmptyEntries);
    if(!data[0].Trim().Equals("NAME"))
        listOutput.Add(new DatiOutput{ Name = data[0].Trim(), City = data[1].Trim()});
}

I have assumed your DatiOutput looks like following as it was not given.

public class DatiOutput 
{
public string City{get;set;}
public string Name{get;set;}
}

Then next step is to Group the collection based on City and then write them to file. You can use LINQ to group the collection based on City.

listOutput.GroupBy(c=>c.City)

Once your have the result, you can now create file name with corresponding city name appended, and add the data to it.

foreach (var objOut in listOutput.GroupBy(c=>c.City))
{
    var filePath = $"{Path.Combine(Path.GetDirectoryName(inputFile),Path.GetFileNameWithoutExtension(inputFile))}_{objOut.First().City}.csv";

    using(System.IO.StreamWriter fileOut = new System.IO.StreamWriter(File.Open(filePath, FileMode.OpenOrCreate, FileAccess.ReadWrite)))
    {
        fileOut.WriteLine($"NAME;CITY");
        foreach(var items in objOut)
        {
            fileOut.WriteLine($"{items.Name};{items.City}");
        }
    }
}

You would have the desired result

foreach (var g in File.ReadAllLines("full list.csv")
    .Skip(1)
    .Select(l => new {
        Name = l.Substring(0, l.IndexOf(',')),
        City = l.Substring(l.IndexOf(',') + 1) })
    .GroupBy(l => l.City))
{
    File.WriteAllLines($"full list_{g.Key}.csv", new[] { "NAME,CITY" }
        .Concat(g.Select(l => $"{l.Name},{l.City}")));
}

The key part your example was missing was GroupBy - this allows you to group the data you have read in to groups based on a certain criteria (in our case City).

Group by is a powerful LINQ extension that allows you to filter data. The example above reads in all the data, skips the header, uses select to transform each line into an instance of an anonymous type to contain the name and city. GroupBy is then used to group these instances by city. Then for each group the data is written to a new file.

I would take @TVOHMs answer to slightly cleaner direction by keeping the same codestyle on the whole solution.

File.ReadAllLines("full list.csv")         // Read the input file
    .Skip(1)                               // Skip the header row
    .Select(row => row.Split(','))         // Split each row to array of city and name
    .GroupBy(row => row[1], row => row[0]) // Group by cities, selecting names
    .ToList()                              // To list, so .ForEach is possible
    .ForEach(group => File.WriteAllLines($"full list_{group.Key}.csv", group)); // Create file for each group and write the names

Here's a non-LINQy approach using a Dictionary to keep a reference to each output file based on the city name as the Key (there's nothing wrong with LINQ, though!):

string[] values;
string header;
string line, city, outputFileName;
string inputFile = "full list.csv";
Dictionary<string, System.IO.StreamWriter> outputFiles = new Dictionary<string, System.IO.StreamWriter>();
using (System.IO.StreamReader file = new System.IO.StreamReader(inputFile))
{
    header = file.ReadLine();
    while ((line = file.ReadLine()) != null)
    {
        values = line.Split(",".ToCharArray());
        city = values[1];
        if (!outputFiles.ContainsKey(city))
        {
            outputFileName = "full list_" + city + ".csv";
            outputFiles.Add(city, new System.IO.StreamWriter(outputFileName));
            outputFiles[city].WriteLine(header);
        }
        outputFiles[city].WriteLine(line);
    }
}   
foreach(System.IO.StreamWriter outputFile in outputFiles.Values)
{
    outputFile.Close();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM