[英]FileInfo.Length != sum of all line length
I'm trying to make a progress bar for big file's reading. 我正在尝试为大文件阅读制作进度条。 I set the progress bar's maximum value to
FileInfo.Length
, I read each line using StreamReader.ReadLine
and compute the sum of each line length (with String.Length
) to set the progress bar's current value. 我将进度条的最大值设置为
FileInfo.Length
,我使用StreamReader.ReadLine
读取每一行并计算每个行长度的总和(使用String.Length
)来设置进度条的当前值。
What I noticed is that there is a difference between the file's total length and the sum of the length of each line. 我注意到文件的总长度和每行的长度之和存在差异。 For example :
FileInfo.Length
= 25577646 Sum of all line length = 25510563 例如:
FileInfo.Length
= 25577646所有行长度的总和= 25510563
Why is there such a difference ? 为什么会有这样的差异?
Thanks for your help ! 谢谢你的帮助 !
You aren't adding the end-of-lines. 您没有添加行尾。 It could be from 1 to 4 bytes, depending on the encoding or if it is a
\\n
or a \\r
or a \\r\\n
(1 byte = UTF8 + \\n
, 4 bytes = UTF16 + \\r\\n
) 它可以是从1到4个字节,这取决于编码,或者如果它是一个
\\n
或\\r
或\\r\\n
(1个字节= UTF8 + \\n
,4个字节= UTF16 + \\r\\n
)
Note that with ReadLine
it isn't possible to check which end-of-line ( \\n
or \\r
or \\r\\n
it encountered) 请注意,使用
ReadLine
,无法检查哪个行尾( \\n
或\\r
或\\r\\n
遇到它)
A line is defined as a sequence of characters followed by a line feed ("\\n"), a carriage return ("\\r"), or a carriage return immediately followed by a line feed ("\\r\\n")
一行被定义为一个字符序列,后跟一个换行符(“\\ n”),一个回车符(“\\ r”),或一个回车符后面紧跟一个换行符(“\\ r \\ n”)
Other problem: if your file is UTF8, then C# char length is different from byte length: è
is one char in C# (that uses UTF16), 2 chars in UTF8. 其他问题:如果你的文件是UTF8,那么C#char长度与字节长度不同:
è
是C#中的一个char(使用UTF16),UTF8中有2个字符。 You could: 你可以:
int len = Encoding.UTF8.GetByteCount(line);
Two problems here: 这里有两个问题:
string.Length
gives you the number of characters in each string, whereas FileInfo.Length
gives you the number of bytes . string.Length
为您提供每个字符串中的字符数,而FileInfo.Length
为您提供字节数 。 Those can be very different things, depending on the characters and the encoding used \\n
or \\r\\n
) as those are removed when reading lines with TextReader.ReadLine
\\n
或\\r\\n
),因为在使用TextReader.ReadLine
读取行时会删除换行符 In terms of what to do about this... 关于如何做到这一点......
Encoding.GetBytes
to account for that difference. Encoding.GetBytes
将每行重新转换为字节来解释该差异。 It would be pretty wasteful to do this though. Stream.Position
to detect how far through the file you've actually read. Stream.Position
来检测您实际读取的文件的距离。 That won't necessarily be the same as the amount of data you've processed though, as the StreamReader
will have a buffer. StreamReader
将具有缓冲区。 (So you may well "see" that the Stream
has read all the data even though you haven't processed all the lines yet.) Stream
已读取所有数据。) The last idea is probably the cleanest, IMO. 最后一个想法可能是最干净的IMO。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.