简体   繁体   中英

Best way to read BIG text files with crlf line delimiter

I have a very large comma delimited text file. Each field is, as stated, delimited by a comma and surrounded by quotes (all strings). The problem is that some of the fields contain a CR for multiple lines within that field. So when I do a ReadLine it stops at that CR. It would be nice if I can tell it to ONLY stop at CRLF combinations.

Does anyone have any snappy method to do this? The files can be very very large.

If you want specific ReadLine , why not implement it?

  public static class MyFileReader {
    public static IEnumerable<String> ReadLineCRLF(String path) {
      StringBuilder sb = new StringBuilder();

      Char prior = '\0';
      Char current = '\0';

      using (StreamReader reader = new StreamReader(path)) {
        int v = reader.Read();

        if (v < 0) {
          if (prior == '\r')
            sb.Append(prior);

          yield return sb.ToString();

          yield break;
        }

        prior = current;
        current = (Char) v;

        if ((current == '\n') && (prior == '\r')) {
          yield return sb.ToString();

          sb.Clear();
        }
        else if (current == '\r') {
          if (prior == '\r')
            sb.Append(prior);
        }
        else
          sb.Append(current);
      }
    }
  }

Then use it

  var lines = MyFileReader
    .ReadLineCRLF(@"C:\MyData.txt"); 

How about using

string line = File.ReadAllText("input.txt"); // Read the text in one line

Then split it on carriage return/line feed like this:

var split = line.Split('\n'); // I'm not really sure it's \n you'll need, but it's something!

and then processing like by line in a loop

foreach(var line in split) { ... }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM