简体   繁体   English

C#确定平面文件中的EOL字符

[英]C# Determine EOL Character in flat files

I am attempting to identify what the EOL character is from a given .txt or .csv flat file. 我正在尝试从给定的.txt或.csv平面文件中识别什么是EOL字符。 Based on what the EOL character is from the first row of data in the flat file, I want to process the data from this file accordingly (I am using Bulk Load to create tables on SQL Server and need to pass in the EOL to the bulk load command). 根据平面文件中第一行数据的EOL字符,我要相应地处理此文件中的数据(我正在使用Bulk Load在SQL Server上创建表,需要将EOL传递给批量加载命令)。 From what I understand, Readline() handles the EOL automatically, so I can't parse the Readline() string for the EOL character. 据我了解,Readline()是自动处理EOL的,因此我无法解析EOL字符的Readline()字符串。 The code below gives an example of what I am trying to do: 下面的代码提供了我要执行的操作的示例:

int EOLChar_CRLF = 0;
int EOLChar_LF = 0;
int EOLChar_CR = 0;
int EOLChar_Hex = 0;

string eol_line = file2.ReadLine();
MessageBox.Show(eol_line);
EOLChar_CRLF = eol_line.IndexOf("\\r\\\n");
EOLChar_LF = eol_line.IndexOf("\\n");
EOLChar_CR = eol_line.IndexOf("\\r");
EOLChar_Hex = eol_line.IndexOf("\\0x0a");

MessageBox.Show("CRLF is line feed if " + EOLChar_CRLF.ToString() + " <> -1");
MessageBox.Show("LF is line feed if " + EOLChar_LF.ToString() + " <> -1");
MessageBox.Show("CR is line feed if " + EOLChar_CR.ToString() + " <> -1");
MessageBox.Show("0x0a is line feed if " + EOLChar_Hex.ToString() + " <> -1");

Does anybody know of a way to determine the EOL using the StreamReader.ReadLine() method or any other way of accomplishing this? 有人知道使用StreamReader.ReadLine()方法确定EOL的方法还是其他实现此方法的方法? I only want to read in the first row of data and parse that for the EOL since some of these files are 20+ million rows. 我只想读数据的第一行并对EOL进行解析,因为其中一些文件是20+百万行。

The usual way to determine the end-of-line convention for a text file is to slurp in a buffer of sufficient size from the start of the file and examine it. 确定文本文件的行尾约定的通常方法是从文件的开头开始在足够大的缓冲区中插入并检查它。 The size of the buffer, of course, is somewhat dependent on the line length expected. 当然,缓冲区的大小在某种程度上取决于预期的行长。 You want to slurp in enough data to get a reasonable number of lines. 您想要吸收足够的数据以获得合理数量的行。

You are unlikely to encounter end-of-line conventions other than Windows (CR+LF), Unix/Linux/OS X (LF) or old-school MacOS (CR). 除了Windows(CR + LF),Unix / Linux / OS X(LF)或老式MacOS(CR)外,您不太可能遇到行尾约定。 For speed, it would be hard to beat something like this. 为了提高速度,很难打败类似的东西。

public enum EndOfLineStyle
{
  Unknown = 0     ,
  CR      = 1     ,
  LF      = 2     ,
  CRLF    = CR|LF ,
  Unix    = LF    ,
  MacOs   = CR    ,
  Windows = CRLF  ,
}

const int BUFFER_SIZE = 8192 ;
public EndOfLineStyle DetermineEndOfLineStyle( string pathToFile )
{
  int    bufl  = 0 ;
  char[] buf   = new char[BUFFER_SIZE] ;

  using ( StreamReader reader = File.OpenText( pathToFile ) )
  {
    bufl = reader.ReadBlock( buf , 0 , buf.Length ) ;
  }

  int crlfs = 0 ;
  int crs   = 0 ;
  int lfs   = 0 ;

  for ( int i = 0 ; i < bufl ; )
  {
    if      ( buf[i] == '\r' && i < bufl-1 && buf[i+1] == '\n' ) { ++crlfs ; i+=2 ; }
    else if ( buf[i] == '\r'                                   ) { ++crs   ; i+=1 ; }
    else if ( buf[i] == '\n' )                                   { ++lfs   ; i+=1 ; }
  }

  EndOfLineStyle style ;
  if      ( crlfs > crs   && crlfs > lfs ) style = EndOfLineStyle.Windows ;
  else if ( lfs   > crlfs && lfs   > crs ) style = EndOfLineStyle.Unix    ;
  else if ( crs   > crlfs && crs   > lfs ) style = EndOfLineStyle.MacOs   ;
  else                                     style = EndOfLineStyle.Unknown ;

  return style ;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM