[英]Get the size, in bytes, of how much a string will occupy when written to a file?
I've been reading answers that explains how to get the size of a string, size in memory or size in file: 我一直在阅读解释如何获取字符串大小,内存大小或文件大小的答案:
How to know the size of the string in bytes? 如何知道字符串的大小(以字节为单位)?
Find size of object instance in bytes in c# 在C#中以字节为单位查找对象实例的大小
How to know the byte size of a string? 如何知道一个字符串的字节大小? - MSDN social
-MSDN社交
My intention is to detemine the amount of bytes that a string will occupy, in specified encoding, when written to file. 我的意图是确定在写入文件时,字符串将以指定的编码占用的字节数。
However, my function does not return the expected result when I check the size of a string for Encoding.UTF8
, Encoding.Unicode
(UTF-16) or Encoding.UTF32
. 但是,当我检查
Encoding.UTF8
, Encoding.Unicode
(UTF-16)或Encoding.UTF32
的字符串大小时,我的函数没有返回预期的结果。
This is what I'm doing: 这就是我在做什么:
''' ----------------------------------------------------------------------
''' <summary>
''' Gets the size, in bytes, of how much a string will occupy when written to a file.
''' </summary>
''' ----------------------------------------------------------------------
<DebuggerStepThrough>
<Extension>
Public Function SizeInFile(ByVal sender As String,
Optional ByVal encoding As Encoding = Nothing) As Integer
If (encoding Is Nothing) Then
encoding = System.Text.Encoding.Default
End If
Return encoding.GetByteCount(sender)
End Function
This is how I'm testing it, in the code below, the function says the string size is 2 bytes, but when written to a file the filesize is 4 bytes: 这就是我测试的方式,在下面的代码中,该函数说字符串大小为2个字节,但是当写入文件时,文件大小为4个字节:
Dim str As String = "Ñ"
Console.WriteLine(String.Format("Size of String : {0}", str.SizeInFile(Encoding.Unicode)))
File.WriteAllText(".\Test.txt", str, Encoding.Unicode)
Console.WriteLine(String.Format("Size of txtfile: {0}", New FileInfo(".\Test.txt").Length))
What am I missing to perform an efficient evaluation of the string size?. 我缺少有效评估字符串大小的内容吗?
In C# or VB.NET. 在C#或VB.NET中。
A file may begin with a byte order mark (called BOM) that helps the reader to detect what encoding was used. 文件可以以字节顺序标记(称为BOM)开头,该标记可以帮助读者检测所使用的编码。
The BOM for UTF8 is 3 bytes EF,BB,BF UTF8的BOM为3字节EF,BB,BF
For UTF16 (Encoding.Unicode) 2 bytes FEFF (encoded as either big endian or little endian depending on the encoding) 对于UTF16(Encoding.Unicode)2字节FEFF(根据编码方式编码为大端或小端)
For UTF32 4 bytes 0000FEFF 对于UTF32 4字节0000FEFF
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.