简体   繁体   English

解析CSV格式的文本文件

[英]Parsing a CSV formatted text file

I have a text file that looks like this: 我有一个看起来像这样的文本文件:

1,Smith, 249.24, 6/10/2010
2,Johnson, 1332.23, 6/11/2010
3,Woods, 2214.22, 6/11/2010
1,Smith, 219.24, 6/11/2010

I need to be able to find the balance for a client on a given date. 我需要能够在给定日期找到客户的余额。

I'm wondering if I should: 我想知道我是否应该:

A. Start from the end and read each line into an Array, one at a time. A.从最后开始,每行读一个数组到一个数组。 Check the last name index to see if it is the client we're looking for. 检查姓氏索引以查看它是否是我们正在寻找的客户端。 Then, display the balance index of the first match. 然后,显示第一个匹配的余额索引。

or 要么

B. Use RegEx to find a match and display it. B.使用RegEx查找匹配并显示它。

I don't have much experience with RegEx, but I'll learn it if it's a no brainer in a situation like this. 我对RegEx没有多少经验,但如果在这样的情况下,我会学到它。

I would recommend using the FileHelpers opensource project: http://www.filehelpers.net/ 我建议使用FileHelpers opensource项目: http//www.filehelpers.net/

Piece of cake: 小菜一碟:

Define your class: 定义你的课程:

[DelimitedRecord(",")]
public class Customer
{
    public int CustId;

    public string Name;

    public decimal Balance;

    [FieldConverter(ConverterKind.Date, "dd-MM-yyyy")]
    public DateTime AddedDate;

}   

Use it: 用它:

var engine = new FileHelperAsyncEngine<Customer>();

// Read
using(engine.BeginReadFile("TestIn.txt"))
{
   // The engine is IEnumerable 
   foreach(Customer cust in engine)
   {
      // your code here
      Console.WriteLine(cust.Name);

      // your condition >> add balance
   }
}

I think the cleanest way is to load the entire file into an array of custom objects and work with that. 我认为最干净的方法是将整个文件加载到一个自定义对象数组中并使用它。 For 3 MB of data, this won't be a problem. 对于3 MB的数据,这不会是一个问题。 If you wanted to do completely different search later, you could reuse most of the code. 如果您想稍后进行完全不同的搜索,则可以重用大部分代码。 I would do it this way: 我会这样做:

class Record
{
  public int Id { get; protected set; }
  public string Name { get; protected set; }
  public decimal Balance { get; protected set; }
  public DateTime Date { get; protected set; }

  public Record (int id, string name, decimal balance, DateTime date)
  {
    Id = id;
    Name = name;
    Balance = balance;
    Date = date;
  }
}

…

Record[] records = from line in File.ReadAllLines(filename)
                   let fields = line.Split(',')
                   select new Record(
                     int.Parse(fields[0]),
                     fields[1],
                     decimal.Parse(fields[2]),
                     DateTime.Parse(fields[3])
                   ).ToArray();

Record wantedRecord = records.Single
                      (r => r.Name = clientName && r.Date = givenDate);

This looks like a pretty standard CSV type layout, which is easy enough to process. 这看起来像一个非常标准的CSV类型布局,很容易处理。 You can actually do it with ADO.Net and the Jet provider, but I think it is probably easier in the long run to process it yourself. 您实际上可以使用ADO.Net和Jet提供程序来完成它,但我认为从长远来看它可能更容易自己处理它。

So first off, you want to process the actual text data. 首先,您要处理实际的文本数据。 I assume it is reasonable to assume each record is seperated by some newline character, so you can utilize the ReadLine method to easily get each record: 我假设假设每条记录都被一些换行符分隔是合理的,所以你可以利用ReadLine方法轻松获取每条记录:

StreamReader reader = new StreamReader("C:\Path\To\file.txt")
while(true)
{
    var line = reader.ReadLine();
    if(string.IsNullOrEmpty(line))
        break;
    // Process Line
}

And then to process each line, you can split the string on comma, and store the values into a data structure. 然后要处理每一行,您可以在逗号上拆分字符串,并将值存储到数据结构中。 So if you use a data structure like this: 因此,如果您使用这样的数据结构:

public class MyData
{
    public int Id { get; set; }
    public string Name { get; set; }
    public decimal Balance { get; set; }
    public DateTime Date { get; set; }
}

And you can process the line data with a method like this: 您可以使用以下方法处理行数据:

public MyData GetRecord(string line)
{
    var fields = line.Split(',');
    return new MyData()
    {
        Id = int.Parse(fields[0]),
        Name = fields[1],
        Balance = decimal.Parse(fields[2]),
        Date = DateTime.Parse(fields[3])
    };
}

Now, this is the simplest example, and doesn't account for cases where the fields may be empty, in which case you would either need to support NULL for those fields (using nullable types int?, decimal? and DateTime?), or define some default value that would be assigned to those values. 现在,这是最简单的示例,并不考虑字段可能为空的情况,在这种情况下,您需要为这些字段支持NULL(使用可空类型int?,decimal?和DateTime?),或者定义将分配给这些值的一些默认值。

So once you have that you can store the collection of MyData objects in a list, and easily perform calculations based on that. 所以,一旦你有了,你可以将MyData对象的集合存储在一个列表中,并根据它轻松执行计算。 So given your example of finding the balance on a given date you could do something like: 因此,假设您在给定日期找到余额的示例,您可以执行以下操作:

var data = customerDataList.First(d => d.Name == customerNameImLookingFor 
                                    && d.Date == dateImLookingFor);

Where customerDataList is the collection of MyData objects read from the file, customerNameImLookingFor is a variable containing the customer's name, and customerDateImLookingFor is a variable containing the date. customerDataList是从文件读取的MyData对象的集合, customerNameImLookingFor是包含客户名称的变量, customerDateImLookingFor是包含日期的变量。

I've used this technique to process data in text files in the past for files ranging from a couple records, to tens of thousands of records, and it works pretty well. 我已经使用这种技术处理过去文本文件中的数据,用于从几条记录到数万条记录的文件,并且它运行良好。

Note that both your options will scan the file. 请注意,您的选项都将扫描文件。 That is fine if you only want to search in the file for 1 item. 如果您只想在文件中搜索1个项目,那就没问题了。

If you need to search for multiple client/date combinations in the same file, you could parse the file into a Dictionary<string, Dictionary <date, decimal>> first. 如果需要在同一文件中搜索多个客户端/日期组合,可以先将文件解析为Dictionary<string, Dictionary <date, decimal>>

A direct answer: for a one-off, a RegEx will probably be faster. 直接回答:对于一次性,RegEx可能会更快。

If you're just reading it I'd consider reading in the whole file in memory using StreamReader.ReadToEnd and then treating it as one long string to search through and when you find a record you want to look at just look for the previous and next line break and then you have the transaction row you want. 如果您只是阅读它,我会考虑使用StreamReader.ReadToEnd在内存中读取整个文件,然后将其视为一个长字符串进行搜索,当您找到想要查看的记录时,只需查看上一个和下一行中断,然后你有你想要的交易行。

If it's on a server or the file can be refreshed all the time this might not be a good solution though. 如果它在服务器上或者文件可以一直刷新,这可能不是一个好的解决方案。

If it's all well-formatted CSV like this then I'd use something like the Microsoft.VisualBasic.TextFieldParser class or the Fast CSV class over on code project to read it all in. 如果它是像这样格式良好的CSV,那么我会在代码项目中使用类似Microsoft.VisualBasic.TextFieldParser类或Fast CSV类的内容来读取它。

The data type is a little tricky because I imagine not every client has a record for every day. 数据类型有点棘手,因为我想不是每个客户每天都有记录。 That means you can't just have a nested dictionary for your looksup. 这意味着您不能只为您的查找设置嵌套字典。 Instead, you want to "index" by name first and then date, but the form of the date record is a little different. 相反,您希望首先按名称“索引”,然后按日期,但日期记录的形式稍有不同。 I think I'd go for something like this as I read in each record: 我想我会在每条记录中读到这样的东西:

Dictionary<string, SortedList<DateTime, double>>

hey, hey, hey!!! 嘿嘿嘿!!! why not do it with this great project on codeproject Linq to CSV , way cool! 为什么不在codeproject Linq到CSV这个伟大的项目上做到这一点,很酷! rock solid 坚如磐石

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM