简体   繁体   中英

C# File.ReadAllLines and StreamReader.ReadLine Splitting some lines

I have a project in the works to read and convert CSV files based on a set of arbitrary rules, pick a file tell the program how it should output the data based on the input and parse the file.

The problem that I have is when I read the lines from my input files it will sometimes read additional lines or split lines halfway through into two lines, I initially used ReadAllLines then tested with this code:

int testCount = 0;
StreamReader sr = File.OpenText(_FilePath.Text);
while(!sr.EndOfStream)
{
    sr.ReadLine();
    testCount++;
}
sr.Close();
sr.Dispose();

Console.WriteLine("Lines in For: " + testCount);

and found that a file that has 627 lines is being read as having 681 lines (using both ReadAllLines and counting the lines in the above code.

I tried looking for people having the same issue and tried looking to see if there was perhaps a max length of a 'line' in these methods, Nothing turned up on google, the first line in the file that acts up is this one (changed information in the line to protect privacy, all special characters are present)

CODE, A/B Company Name, CONTACT NAME, ATTN  NAME A/B, 1234 CORPORATE CORP ST, Smithington, SM, 1234, , 123-456-7890, 123-456-7890, 12345 Plum ROAD, , Nowhere, NW, 12345, A/B Company Name2, Courier, , "Some A Info B For.Shipping Accnt. # 123456789 calendar days early^ 3 days late.", , 

The file itself was exported out of an excel Spreadsheet to CSV, all commas in the original file were replaced with ^ (to prevent issues) and will be re-converted to commas later.

So, anyone know of a limit to the length of a line in ReadAllLines or is there something else going on here behind the scenes? since this was exported from Excel (originally a DBF file) I don't 'think' this is an issue with the file, but I could be wrong, anything I can do to find out?

I guarantee that File.ReadAllLines() and StreamReader.ReadLine() are both behaving exactly as documented, with no hidden traps for you to stumble into.

Do note that neither distinguish between different line-break modes. In a single file, they will happily break a line on \\r , \\n , and \\r\\n . Note that this means a file which nominally uses the Windows standards of \\r\\n , but which has extra \\r and/or \\n characters in it will be interpreted as having extra line breaks. Note also that while \\r\\n is treated as a single line break, \\n\\r is treated as two line-breaks.

The way to diagnose exactly what's going on is to look at the file as binary. First, check your output to see where it's breaking the lines, and in particular the first place you find where it breaks a line where you believe it should not have.

Then, open the file in Visual Studio, but instead of just opening it, select the "Open With..." option (click the black triangle on the "Open" button), and choose "Binary Editor". Look through the file to find the text where the first unwanted line break occurred and check the hex values in the file at that location. You will find some combination of \\r , \\n , or \\r\\n there ( \\r is the hex value 0D and \\n is 0A ).

Please specify the encoding of the file while you read the file. File.OpenText uses UTF8 encoding by default. Try this:

string[] lines = File.ReadAllLines(path, encoding); //UTF-16 or ASCII etc

http://msdn.microsoft.com/en-us/library/bsy4fhsa(v=vs.110).aspx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM