简体   繁体   中英

Reading a Multiple Header CSV file using CsvHelper

I have a big CSV file with multiple header rows which you can see a sample on below. How can I read it with CsvHelper in C#?

As shown below, headers repeat periodically in the CSV. There are also a lot of rows that start with "+".

An example follows:

FAUF-Rückmeldungen aus SFC500:  4200 Sätze ausgegeben
+----+---------------+---------------+----+--------------+-------------+------------+
|Werk|Rückmeldenummer|Rückmeldezähler|AVO |Rückmeldedatum|Rückmeldezeit|Arbeitsplatz|
+----+---------------+---------------+----+--------------+-------------+------------+
|TR10|      410959107|              2|0800|26.07.2021    |00:01:24     |155164-B    |
|TR10|      411158037|             20|0900|26.07.2021    |00:02:33     |155217-A    |
|TR10|      410985740|             25|0900|26.07.2021    |00:02:39     |155196-A    |
|TR10|      410279717|             57|0900|26.07.2021    |00:02:40     |155196-A    |
|TR10|      410630007|              6|0900|26.07.2021    |00:02:41     |155196-B    |
|TR10|      411237292|             25|0900|26.07.2021    |00:02:41     |155196-A    |
|TR10|      410276088|             20|0900|26.07.2021    |00:06:56     |155217-A    |
|TR10|      410950998|              1|0900|26.07.2021    |00:06:57     |155217-A    |
|TR10|      411237292|             26|0900|26.07.2021    |00:06:57     |155196-A    |
|TR10|      410556669|              1|0900|26.07.2021    |00:06:58     |155217-A    |
|TR10|      411237292|             27|0900|26.07.2021    |00:06:58     |155196-A    |
|TR10|      410556669|              2|0900|26.07.2021    |00:06:59     |155217-A    |
|TR10|      410630007|              7|0900|26.07.2021    |00:07:00     |155196-B    |
|TR10|      411525402|              5|0900|26.07.2021    |00:07:00     |155114-A    |
|TR10|      411525402|              6|0900|26.07.2021    |00:07:01     |155114-A    |
|TR10|      411528024|              1|0900|26.07.2021    |00:07:02     |155114-A    |
|TR10|      411528024|              2|0900|26.07.2021    |00:07:03     |155114-A    |
|TR10|      411528929|             30|0900|26.07.2021    |00:07:04     |155114-A    |
|TR10|      411544500|              3|0900|26.07.2021    |00:07:05     |155114-A    |
|TR10|      411528928|              8|0905|26.07.2021    |00:10:19     |155123-C    |
|TR10|      410279717|             58|0900|26.07.2021    |00:11:48     |155196-A    |
|TR10|      411237292|             28|0900|26.07.2021    |00:11:49     |155196-A    |
|TR10|      410630007|              8|0900|26.07.2021    |00:11:50     |155196-B    |
|TR10|      411237293|              2|0990|26.07.2021    |00:14:14     |155164-A    |
|TR10|      410633488|              1|0600|26.07.2021    |00:14:52     |155163-0    |
|TR10|      410633212|              1|0600|26.07.2021    |00:14:59     |155163-0    |
|TR10|      411218828|              2|0600|26.07.2021    |00:15:08     |155163-0    |
|TR10|      411438190|              3|0910|26.07.2021    |00:15:14     |155163-E    |
|TR10|      411527748|              1|0910|26.07.2021    |00:15:19     |155163-B    |
|TR10|      411367433|              2|0910|26.07.2021    |00:16:17     |155163-D    |
|TR10|      411032464|              3|0910|26.07.2021    |00:16:26     |155163-D    |
|TR10|      411525402|              7|0900|26.07.2021    |00:16:49     |155114-A    |
|TR10|      411528024|              3|0900|26.07.2021    |00:16:50     |155114-A    |
|TR10|      411544500|              4|0900|26.07.2021    |00:16:51     |155114-A    |
|TR10|      410985740|             26|0900|26.07.2021    |00:16:55     |155196-A    |
|TR10|      410279717|             59|0900|26.07.2021    |00:16:56     |155196-A    |
|TR10|      411237292|             29|0900|26.07.2021    |00:16:57     |155196-A    |
|TR10|      410900407|              2|0040|26.07.2021    |00:17:46     |155135-D    |
|TR10|      409944144|              1|0910|26.07.2021    |00:18:47     |155163-C    |
|TR10|      411544499|              1|0905|26.07.2021    |00:19:42     |155123-C    |
|TR10|      411525401|              5|0905|26.07.2021    |00:19:56     |155123-C    |
|TR10|      410985740|             27|0900|26.07.2021    |00:21:47     |155196-A    |
|TR10|      410630007|              9|0900|26.07.2021    |00:21:48     |155196-B    |
|TR10|      411237292|             30|0900|26.07.2021    |00:21:48     |155196-A    |
|TR10|      411544437|              4|0900|26.07.2021    |00:22:22     |155114-A    |
|TR10|      411544436|              1|0905|26.07.2021    |00:22:41     |155123-C    |
|TR10|      411551402|              2|0005|26.07.2021    |00:24:00     |155115-B    |
|TR10|      411362459|              1|0005|26.07.2021    |00:24:52     |155115-B    |
|TR10|      411369893|              1|0060|26.07.2021    |00:25:25     |155112-G    |
|TR10|      411530629|              1|0005|26.07.2021    |00:25:37     |155115-B    |
|TR10|      411369897|              1|0063|26.07.2021    |00:25:40     |155112-F    |
|TR10|      411369894|              1|0070|26.07.2021    |00:25:54     |155518-0    |
|TR10|      411369897|              2|0063|26.07.2021    |00:26:02     |155112-F    |
|TR10|      411369894|              2|0070|26.07.2021    |00:26:10     |155518-0    |
|TR10|      411369897|              3|0063|26.07.2021    |00:26:21     |155112-F    |
|TR10|      411369894|              3|0070|26.07.2021    |00:26:28     |155518-0    |
|TR10|      411369897|              4|0063|26.07.2021    |00:26:37     |155112-F    |
|TR10|      411369894|              4|0070|26.07.2021    |00:26:43     |155518-0    |
|TR10|      410950998|              2|0900|26.07.2021    |00:26:45     |155217-A    |
+----+---------------+---------------+----+--------------+-------------+------------+
+----+---------------+---------------+----+--------------+-------------+------------+
|Werk|Rückmeldenummer|Rückmeldezähler|AVO |Rückmeldedatum|Rückmeldezeit|Arbeitsplatz|
+----+---------------+---------------+----+--------------+-------------+------------+
|TR10|      410279717|             60|0900|26.07.2021    |00:26:46     |155196-A    |
|TR10|      410950998|              3|0900|26.07.2021    |00:26:46     |155217-A    |
|TR10|      410630007|             10|0900|26.07.2021    |00:26:47     |155196-B    |
|TR10|      411369897|              5|0063|26.07.2021    |00:26:54     |155112-F    |
|TR10|      411369894|              5|0070|26.07.2021    |00:27:04     |155518-0    |
|TR10|      411369897|              6|0063|26.07.2021    |00:27:15     |155112-F    |
|TR10|      411369894|              6|0070|26.07.2021    |00:27:23     |155518-0    |
|TR10|      411086222|              1|0001|26.07.2021    |00:27:50     |155212-A    |
|TR10|      411086223|              1|0005|26.07.2021    |00:27:58     |155210-A    |
|TR10|      411520617|              7|0905|26.07.2021    |00:30:28     |155123-C    |
|TR10|      411872172|              1|0010|26.07.2021    |00:31:27     |155145-A    |
|TR10|      411872177|              1|0010|26.07.2021    |00:31:39     |155145-A    |
|TR10|      411528024|              4|0900|26.07.2021    |00:31:50     |155114-A    |
|TR10|      411872182|              1|0010|26.07.2021    |00:31:50     |155145-A    |
|TR10|      410985740|             28|0900|26.07.2021    |00:31:54     |155196-A    |
|TR10|      410279717|             61|0900|26.07.2021    |00:31:55     |155196-A    |
|TR10|      411872187|              1|0010|26.07.2021    |00:32:02     |155145-A    |
|TR10|      410699054|              1|0060|26.07.2021    |00:32:52     |155112-K    |
|TR10|      410699055|              1|0063|26.07.2021    |00:33:01     |155112-L    |
|TR10|      410699056|              1|0070|26.07.2021    |00:33:11     |155518-0    |
|TR10|      411434349|              2|0080|26.07.2021    |00:33:18     |155213-F    |
|TR10|      410850582|              1|0051|26.07.2021    |00:33:54     |155146-E    |
|TR10|      410850583|              1|0055|26.07.2021    |00:34:01     |155146-F    |
|TR10|      410850580|              1|0080|26.07.2021    |00:34:09     |155518-0    |
|TR10|      410774889|              1|0050|26.07.2021    |00:34:13     |155171-D    |
|TR10|      411243279|              2|0005|26.07.2021    |00:34:27     |155531-A    |
|TR10|      411243280|              3|0010|26.07.2021    |00:34:37     |155550-B    |
|TR10|      411243281|              1|0020|26.07.2021    |00:34:48     |155550-E    |
|TR10|      411228376|              1|0001|26.07.2021    |00:36:15     |155112-D    |
|TR10|      410985740|             29|0900|26.07.2021    |00:36:46     |155196-A    |
|TR10|      411525402|              8|0900|26.07.2021    |00:36:46     |155114-A    |
|TR10|      411237292|             31|0900|26.07.2021    |00:36:47     |155196-A    |
|TR10|      411533238|              1|0001|26.07.2021    |00:36:55     |155144-A    |
|TR10|      410898440|              2|0010|26.07.2021    |00:37:02     |155171-A    |
|TR10|      411533239|              1|0005|26.07.2021    |00:37:02     |155104-A    |
|TR10|      411874854|              1|0010|26.07.2021    |00:37:37     |FCM-E       |
|TR10|      411032291|              1|0060|26.07.2021    |00:40:09     |155112-G    |
|TR10|      411874855|              1|0010|26.07.2021    |00:40:21     |FCM-E       |
|TR10|      411032293|              1|0063|26.07.2021    |00:40:35     |155112-F    |
|TR10|      411032292|              1|0070|26.07.2021    |00:40:42     |155518-0    |
|TR10|      411032293|              2|0063|26.07.2021    |00:40:51     |155112-F    |
|TR10|      411032292|              2|0070|26.07.2021    |00:40:59     |155518-0    |
|TR10|      411032293|              3|0063|26.07.2021    |00:41:08     |155112-F    |
|TR10|      411032292|              3|0070|26.07.2021    |00:41:15     |155518-0    |
|TR10|      411032293|              4|0063|26.07.2021    |00:41:25     |155112-F    |
|TR10|      411032292|              4|0070|26.07.2021    |00:41:32     |155518-0    |
|TR10|      411032293|              5|0063|26.07.2021    |00:41:41     |155112-F    |
|TR10|      410556669|              3|0900|26.07.2021    |00:41:46     |155217-A    |
|TR10|      410279717|             62|0900|26.07.2021    |00:41:47     |155196-A    |
|TR10|      411237292|             32|0900|26.07.2021    |00:41:48     |155196-A    |
|TR10|      411032292|              5|0070|26.07.2021    |00:41:49     |155518-0    |
|TR10|      411032293|              6|0063|26.07.2021    |00:41:59     |155112-F    |
|TR10|      411032292|              6|0070|26.07.2021    |00:42:07     |155518-0    |
|TR10|      411535704|              1|0010|26.07.2021    |00:43:40     |155144-A    |
|TR10|      411875458|              1|0010|26.07.2021    |00:43:54     |155144-A    |
|TR10|      411528024|              5|0900|26.07.2021    |00:46:47     |155114-A    |
|TR10|      410985740|             30|0900|26.07.2021    |00:46:48     |155196-A    |
|TR10|      410279717|             63|0900|26.07.2021    |00:46:50     |155196-A    |
|TR10|      411525401|              6|0905|26.07.2021    |00:46:56     |155123-C    |
|TR10|      411528023|              1|0905|26.07.2021    |00:47:30     |155123-C    |
+----+---------------+---------------+----+--------------+-------------+------------+

I generated a class

namespace CsvHelper
{
    class Program
    {
        static void Main(string[] args)
        {
            ReadCsv();
        }

        static void ReadCsv()
        {
            var config = new CsvConfiguration(CultureInfo.InvariantCulture)
            {
                Delimiter="|"
            };     
            using (var reader = new StreamReader("file.csv"))
            using (var csv = new CsvReader(reader, config))
            {
                var records = csv.GetRecords<SFC>();
            } 
        }

        public class SFC
        {
            public string Werk { get; set; }

            public string Rückmeldenummer { get; set; }

            public int Rückmeldezähler { get; set; }

            public int AVO { get; set; }

            public DateTime Rückmeldedatum { get; set; }

            public TimeSpan Rückmeldezeit { get; set; }

            public string Arbeitsplatz { get; set; }
        }
    }
}

How can I read this file into a List<SFC> with CsvHelper?

Your text file consists of the following repeating pattern of lines:

  • Zero or more initial lines to be ignored.
  • An initial delimiter line like +----+---------------+
  • A header like |Werk|Rückmeldenummer|.
  • Another delimiter line.
  • Data lines like |TR10| 410959107| |TR10| 410959107| .
  • A final delimiter.

You can read a CSV file in such a format by skipping the initial line then checking the first field to see whether it "looks like" a delimiter, as follows:

enum ReadState
{
    Initial,
    InitialDelimiter,
    Header,
    HeaderDataDelimiter,
    Data,
}

public static List<TRecord> ReadCsv<TRecord>(string filename, ClassMap<TRecord> map)
{
    List<TRecord> records = new ();
    var config = new CsvConfiguration(CultureInfo.InvariantCulture)
    {
        Delimiter="|", // Fixed Delimeter => Delimiter
        PrepareHeaderForMatch = args => args.Header.Trim(), // Added
        TrimOptions = TrimOptions.Trim, // Added
    };     
    using (var reader = new StreamReader(filename))
    using (var csv = new CsvReader(reader, config))
    {
        csv.Context.RegisterClassMap(map);
        var state = ReadState.Initial;
        while (csv.Read())
        {
            var isDelimiter = csv.GetField(0).StartsWith("+-");
            var newState = (isDelimiter, state) switch
            {
                (true, ReadState.Initial) => ReadState.InitialDelimiter,
                (true, ReadState.Header) => ReadState.HeaderDataDelimiter,
                //(true, ReadState.HeaderDataDelimiter) => ReadState.Initial, // Uncomment if your CSV file might contain empty tables with headers and delimiters but no data.
                (true, ReadState.Data) => ReadState.Initial,
                (false, ReadState.Initial) => ReadState.Initial,
                (false, ReadState.InitialDelimiter) => ReadState.Header,
                (false, ReadState.HeaderDataDelimiter) => ReadState.Data,
                (false, ReadState.Data) => ReadState.Data,
                _ => throw new ApplicationException(string.Format("Unexpected row on state {0}", state))
            };
            switch (newState)
            {
                case ReadState.Header: csv.ReadHeader(); break;
                case ReadState.Data: records.Add(csv.GetRecord<TRecord>()); break;
            }                   
            state = newState;
        }
    } 
    return records;
}   

Then define a classmap for SFC as follows:

class SFCMap : ClassMap<SFC>
{
    public SFCMap() : this(new CsvConfiguration(CultureInfo.InvariantCulture)) {}
    public SFCMap(CsvConfiguration config)
    {
        AutoMap(config);
        Map(m => m.Rückmeldedatum).TypeConverterOption.Format("dd.mm.yyyy").TypeConverterOption.DateTimeStyles(DateTimeStyles.AllowWhiteSpaces);
    }
}

public class SFC
{
    public string Werk { get; set; }

    public string Rückmeldenummer { get; set; }

    public int Rückmeldezähler { get; set; }

    public string AVO { get; set; } // Fixed int => string (so as to not lose leading zeros

    public DateTime Rückmeldedatum { get; set; }

    public TimeSpan Rückmeldezeit { get; set; } // Fixed Timespan => TimeSpan

    public string Arbeitsplatz { get; set; }
}

And you will be able to read your CSV file into a List<SFC> as follows:

var records = ReadCsv(filename, new SFCMap());

Notes:

  • You defined AVO as an int , but the fields have leading zeros, eg 0800 . Thus I changed its type to string so that those would be preserved.

  • You need to specify the format "dd.mm.yyyy" when parsing Rückmeldedatum . I added the ClassMap<SFC> in order to provide this.

  • Your "CSV" really is a fixed-width file rather than a CSV file. My assumption is that you want the formatting spaces around the string fields trimmed. If you don't, remove TrimOptions = TrimOptions.Trim .

  • If your CSV file might contain empty tables with headers and delimiters but no data like so:

     +----+---------------+---------------+----+--------------+-------------+------------+ |Werk|Rückmeldenummer|Rückmeldezähler|AVO |Rückmeldedatum|Rückmeldezeit|Arbeitsplatz| +----+---------------+---------------+----+--------------+-------------+------------+ +----+---------------+---------------+----+--------------+-------------+------------+

    Then uncomment:

     //(true, ReadState.HeaderDataDelimiter) => ReadState.Initial,
  • See also the documentation page Reading Multiple Data Sets which discusses a similar parsing problem.

Demo fiddle here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM