简体   繁体   English

清理并从文本文件中提取数据

[英]Cleaning up and extracting data from text files

I need to extract data from non delimited text files using C#. 我需要使用C#从非定界文本文件中提取数据。 Basically, I need to remove all unwanted character then mark the end of a line and add a line break. 基本上,我需要删除所有不需要的字符,然后标记行尾并添加换行符。 Once the data has been separated into individual lines I need to loop through each line in turn and extract values using Regular Expressions. 将数据分成几行后,我需要依次遍历每行并使用正则表达式提取值。 I have been doing this with Perl but now need to do it using C#. 我一直在使用Perl进行此操作,但现在需要使用C#进行此操作。 The raw file contains numerous line break characters throughout the file not jut at the end of a line as you would expect. 原始文件在整个文件中包含许多换行符,而不像您期望的那样在行尾突出。 I will be able to extract values using Regex objects but I am having trouble getting the file into a format that has each record on a line of its own. 我将能够使用Regex对象提取值,但是我无法将文件转换为每条记录都具有一行的格式。

You provided scarce information but. 您提供的信息很少,但是。 This code will create you List of lines. 此代码将创建行列表。

Note that ReadLine will take a sequence of characters followed by a line feed ("\\n"), a carriage return ("\\r") or a carriage return immediately followed by a line feed ("\\r\\n"). 请注意,ReadLine将采用一系列字符,后跟换行符(“ \\ n”),回车符(“ \\ r”)或回车符后立即换行符(“ \\ r \\ n”)。
I am not sure if this is the behaviour you expect. 我不确定这是否是您期望的行为。

    string fileName = "Text.txt";
    List<string> lines = new List<string>();
    using (StreamReader r = new StreamReader(fileName))
    {
        string line;
        while ((line = r.ReadLine()) != null)
        {
            lines.Add(line);
        }
    }

    foreach (string s in lines)
    {
        Console.WriteLine(s);
       //can do your Regex here
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM