简体   繁体   中英

Parsing a CSV formatted text file

I have a text file that looks like this:

1,Smith, 249.24, 6/10/2010
2,Johnson, 1332.23, 6/11/2010
3,Woods, 2214.22, 6/11/2010
1,Smith, 219.24, 6/11/2010

I need to be able to find the balance for a client on a given date.

I'm wondering if I should:

A. Start from the end and read each line into an Array, one at a time. Check the last name index to see if it is the client we're looking for. Then, display the balance index of the first match.

or

B. Use RegEx to find a match and display it.

I don't have much experience with RegEx, but I'll learn it if it's a no brainer in a situation like this.

I would recommend using the FileHelpers opensource project: http://www.filehelpers.net/

Piece of cake:

Define your class:

[DelimitedRecord(",")]
public class Customer
{
    public int CustId;

    public string Name;

    public decimal Balance;

    [FieldConverter(ConverterKind.Date, "dd-MM-yyyy")]
    public DateTime AddedDate;

}   

Use it:

var engine = new FileHelperAsyncEngine<Customer>();

// Read
using(engine.BeginReadFile("TestIn.txt"))
{
   // The engine is IEnumerable 
   foreach(Customer cust in engine)
   {
      // your code here
      Console.WriteLine(cust.Name);

      // your condition >> add balance
   }
}

I think the cleanest way is to load the entire file into an array of custom objects and work with that. For 3 MB of data, this won't be a problem. If you wanted to do completely different search later, you could reuse most of the code. I would do it this way:

class Record
{
  public int Id { get; protected set; }
  public string Name { get; protected set; }
  public decimal Balance { get; protected set; }
  public DateTime Date { get; protected set; }

  public Record (int id, string name, decimal balance, DateTime date)
  {
    Id = id;
    Name = name;
    Balance = balance;
    Date = date;
  }
}

…

Record[] records = from line in File.ReadAllLines(filename)
                   let fields = line.Split(',')
                   select new Record(
                     int.Parse(fields[0]),
                     fields[1],
                     decimal.Parse(fields[2]),
                     DateTime.Parse(fields[3])
                   ).ToArray();

Record wantedRecord = records.Single
                      (r => r.Name = clientName && r.Date = givenDate);

This looks like a pretty standard CSV type layout, which is easy enough to process. You can actually do it with ADO.Net and the Jet provider, but I think it is probably easier in the long run to process it yourself.

So first off, you want to process the actual text data. I assume it is reasonable to assume each record is seperated by some newline character, so you can utilize the ReadLine method to easily get each record:

StreamReader reader = new StreamReader("C:\Path\To\file.txt")
while(true)
{
    var line = reader.ReadLine();
    if(string.IsNullOrEmpty(line))
        break;
    // Process Line
}

And then to process each line, you can split the string on comma, and store the values into a data structure. So if you use a data structure like this:

public class MyData
{
    public int Id { get; set; }
    public string Name { get; set; }
    public decimal Balance { get; set; }
    public DateTime Date { get; set; }
}

And you can process the line data with a method like this:

public MyData GetRecord(string line)
{
    var fields = line.Split(',');
    return new MyData()
    {
        Id = int.Parse(fields[0]),
        Name = fields[1],
        Balance = decimal.Parse(fields[2]),
        Date = DateTime.Parse(fields[3])
    };
}

Now, this is the simplest example, and doesn't account for cases where the fields may be empty, in which case you would either need to support NULL for those fields (using nullable types int?, decimal? and DateTime?), or define some default value that would be assigned to those values.

So once you have that you can store the collection of MyData objects in a list, and easily perform calculations based on that. So given your example of finding the balance on a given date you could do something like:

var data = customerDataList.First(d => d.Name == customerNameImLookingFor 
                                    && d.Date == dateImLookingFor);

Where customerDataList is the collection of MyData objects read from the file, customerNameImLookingFor is a variable containing the customer's name, and customerDateImLookingFor is a variable containing the date.

I've used this technique to process data in text files in the past for files ranging from a couple records, to tens of thousands of records, and it works pretty well.

Note that both your options will scan the file. That is fine if you only want to search in the file for 1 item.

If you need to search for multiple client/date combinations in the same file, you could parse the file into a Dictionary<string, Dictionary <date, decimal>> first.

A direct answer: for a one-off, a RegEx will probably be faster.

If you're just reading it I'd consider reading in the whole file in memory using StreamReader.ReadToEnd and then treating it as one long string to search through and when you find a record you want to look at just look for the previous and next line break and then you have the transaction row you want.

If it's on a server or the file can be refreshed all the time this might not be a good solution though.

If it's all well-formatted CSV like this then I'd use something like the Microsoft.VisualBasic.TextFieldParser class or the Fast CSV class over on code project to read it all in.

The data type is a little tricky because I imagine not every client has a record for every day. That means you can't just have a nested dictionary for your looksup. Instead, you want to "index" by name first and then date, but the form of the date record is a little different. I think I'd go for something like this as I read in each record:

Dictionary<string, SortedList<DateTime, double>>

hey, hey, hey!!! why not do it with this great project on codeproject Linq to CSV , way cool! rock solid

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM