简体   繁体   English

如何从c#中的csv中提取标头

[英]How to extract header from csv in c#

Im loading and splitting couple csv files into two lists in c#. 我在c#中将一对csv文件加载并拆分为两个列表。 Now I also need to extract the header from the first line with the ; 现在我还需要从第一行中提取标题; as delmiter . 作为delmiter I'm trying to use the .Skip(1) command but that only skips (obviously) but I need to extract the header and after my work with the rest of the data is done add it again as the first line. 我正在尝试使用.Skip(1)命令,但这仅跳过(很明显),但是我需要提取标题,在处理完其余数据后,将其再次添加为第一行。

Here is what I have tried so far: 到目前为止,这是我尝试过的:

string[] fileNames = Directory.GetFiles(@"read\", "*.csv");
for (int i = 0; i < fileNames.Length; i++)
{
    string file = @"read\" + Path.GetFileName(fileNames[i]);
    var lines = File.ReadLines(file).Skip(1);
    (List<string> dataA, List<string> dataB) = SplitAllTodataAAnddataB(lines);
    var rowLog = 0;
    foreach (var line in dataA)
    {
       // Variablen für lines
       string[] entries = line.Split(';');
       rowLog++;
       Helper.checkdataAString(entries[0].ToLower(), "abc", rowLog);
       Helper.checkdataAString(entries[1].ToLower(), "firstname", rowLog);
       Helper.checkdataAString(entries[2].ToLower(), "lastname", rowLog);
       Helper.checkdataAString(entries[4].ToLower(), "gender", rowLog);
       Helper.checkdataAString(entries[5].ToLower(), "id", rowLog);
       Helper.checkdataAString(entries[3], "date", rowLog);
       Helper.drawTextProgressBar("loaded rown", rowLog, dataA.Count());
    }
    Console.WriteLine("\nencryypting data");
    var output = new List<string>();
    foreach (var line in dataA)
    {
       try
       {
          string[] entries = line.Split(';');
          string abc = entries[0].ToLower();
          string firstName = koeln.GetPhonetics(entries[1]).ToLower();
          string lastName = koeln.GetPhonetics(entries[2]).ToLower();
          string date = entries[3];
          //Hier werden die drei vorherigen Variablen konkatiniert.
          string NVG = FirstName + "_" + LastName + "_" + BirthDate;
          string gender = entries[4].ToLower();
          string age = Helper.Left(Convert.ToString(20171027 - Convert.ToInt32(entries[3])), 2);
          string zid = Guid.NewGuid().ToString();
          string fid = entries[5].ToLower();
          rowdataA++;
          output.Add($"{abc}; {NVG}; {gender}; {age}; {zid}; {fid}");
          Helper.drawTextProgressBar("encrypted rows.", rowdataA, dataA.Count());
       }
       catch { rowdataA++; }
    }
    File.WriteAllLines(fileTest, output);
}

I'm kinda new to developing so im just trying and any help would be appreciated. 我对开发有点陌生,所以我只是尝试,任何帮助将不胜感激。

You can read file this way: 您可以通过以下方式读取文件:

string file = @"read\" + Path.GetFileName(fileNames[i]);
var content = File.ReadLines(file);

var header = content.ElementAt(0);
var lines = content.Skip(1);

The answer 答案

List<string> lines = File.ReadLines(file);

This contains all the lines from the file. 这包含文件中的所有行。 We know that the first line is the header, and the rest is the content. 我们知道第一行是标题,其余的是内容。

List<string> contentLines = lines.Skip(1);

This is what you had in your code. 这就是代码中的内容。 It contains all lines except the first. 它包含第一行的所有行。

So how do we get only the header line? 那么,如何只获得标题行呢?

string headerLine = lines.First();

There we go. 好了 Notice that this returns a single string, not a list of strings. 请注意,这将返回单个字符串,而不是字符串列表。
If you want to receive a list of strings (eg if you have a header that spans two or more lines), then you can do: 如果要接收字符串列表(例如,如果标题具有跨越两行或更多行的内容),则可以执行以下操作:

List<string> headerLines  = lines.Take(amount_of_header_lines);
List<string> contentLines = lines.Skip(amount_of_header_lines);

Simply put, Take(X) takes the first X items, and Skip(X) takes everything except the first X items. 简而言之, Take(X)接收前X个项目, Skip(X)接收前X个项目之外的所有内容。


Footnotes 脚注

  • Notice that I put lines = File.ReadLines(file) in a separate variable first. 请注意,我首先将lines = File.ReadLines(file)放在单独的变量中。 If I had called File.ReadLines(file) for both the header lines and the content lines (instead of using the lines variable), I would have read the file twice. 如果我已经为标题行和内容行都调用了File.ReadLines(file) (而不是使用lines变量),那么我将读取该文件两次。 That may not matter to you now, but it can lead to performance issues and it's pointless work. 现在,这对您可能并不重要,但是它可能导致性能问题,并且没有意义。
  • The logic for splitting the header line into parts is the same as the logic you have for splitting the content lines into parts. 将标题行拆分为多个部分的逻辑与将内容行拆分为多个部分的逻辑相同。
  • I used Single . 我使用Single You might want to use SingleOrDefault (or you might not). 您可能要使用SingleOrDefault (或可能不会)。 But that ties into a different discussion that is not the focus here. 但这与此处不是重点的其他讨论相关联。
  • Your code accounts for simple CSV structures, but this can get really complicated really fast. 您的代码说明了简单的CSV结构,但是这很快就会变得非常复杂。
    • If you want to use a semicolon as part of your cell value, then you wrap the cell value in quotes. 如果要使用分号作为单元格值的一部分,则可以将单元格值用引号引起来。 For example, notice that this data only represents three columns: ColumnA;"ColumnB;StillColumnB";ColumnC . 例如,请注意,此数据仅表示列: ColumnA;"ColumnB;StillColumnB";ColumnC Your code ( line.Split(';') ) will not account for that. 您的代码( line.Split(';') )将不予考虑。
    • A single row of a table (in Excel) may be split over two lines (when you look at the csv file in a text editor). 表格的单行(在Excel中)可以分为两行(当您在文本编辑器中查看csv文件时)。 This happens if there is a newline character inside a cell's value. 如果单元格的值中包含换行符 ,则会发生这种情况。 File.ReadLines() does not account for that. File.ReadLines()不能解决这个问题。
    • When trying to create a parser for a seemingly simple data format; 尝试为看似简单的数据格式创建解析器时; always check if there is an existing library for this. 始终检查是否有现有的库。 Don't reinvent the wheel (unless it's for training purposes). 不要重新发明轮子 (除非出于训练目的)。 There are a lot of edge cases that you are currently not thinking of, but will eventually become the death of your initially simple code. 您目前没有想到很多边缘情况,但最终会变成您最初简单的代码的终结。
  • Without intending any offense, your code isn't the cleanest. 在不冒犯任何意图的情况下,您的代码不是最干净的。 If you're interested in improving the quality, I suggest posting this code to the CodeReview StackExchange (mention that you're a beginner so you don't get overwhelmed with complex explanations). 如果您有兴趣提高质量,建议您将此代码发布到CodeReview StackExchange(以确保您是一个初学者,这样您就不会因复杂的解释而感到不知所措)。 CodeReview only allows working code , so you need to finish it before you post. CodeReview仅允许工作代码 ,因此您需要先完成工作,然后再发布。

If I understood correctly, you need to read the whole file, process all the lines except the header, then write back a different file with the header and the processed lines, right? 如果我理解正确,则需要读取整个文件,处理除标题以外的所有行,然后用标题和处理过的行写回另一个文件,对吗?

If so, the following approach should work: 如果是这样,则应采用以下方法:

var allLines = File.ReadAllLines(originalFile);
var headerLine = allLines.First();
var dataLines = allLines.Skip(1);
var processedLines = ProcessLines(dataLines);
File.WriteAllLines(newFile, (new[] {headerLine}.Concat(processedLines)).ToArray());

The ProcessLines method would accept the original lines as parameter and return a list with the processed lines: ProcessLines方法将接受原始行作为参数,并返回包含已处理行的列表:

IEnumerable<string> ProcessLines(IEnumerable<string> originalLines)
{
    var processedLines = new List<string>();
    foreach(var line in originalLines)
    {
        var processedLine = //generate your processed line here
        processedLines.Add(processedLine);
    }
    return processedLines;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM