简体   繁体   English

获取以字节为单位的大小,当写入文件时,该字符串将占用多少?

[英]Get the size, in bytes, of how much a string will occupy when written to a file?

I've been reading answers that explains how to get the size of a string, size in memory or size in file: 我一直在阅读解释如何获取字符串大小,内存大小或文件大小的答案:

My intention is to detemine the amount of bytes that a string will occupy, in specified encoding, when written to file. 我的意图是确定在写入文件时,字符串将以指定的编码占用的字节数。

However, my function does not return the expected result when I check the size of a string for Encoding.UTF8 , Encoding.Unicode (UTF-16) or Encoding.UTF32 . 但是,当我检查Encoding.UTF8Encoding.Unicode (UTF-16)或Encoding.UTF32的字符串大小时,我的函数没有返回预期的结果。

This is what I'm doing: 这就是我在做什么:

''' ----------------------------------------------------------------------
''' <summary>
''' Gets the size, in bytes, of how much a string will occupy when written to a file.
''' </summary>
''' ----------------------------------------------------------------------
<DebuggerStepThrough>
<Extension>
Public Function SizeInFile(ByVal sender As String,
                           Optional ByVal encoding As Encoding = Nothing) As Integer

    If (encoding Is Nothing) Then
        encoding = System.Text.Encoding.Default
    End If

    Return encoding.GetByteCount(sender)

End Function

This is how I'm testing it, in the code below, the function says the string size is 2 bytes, but when written to a file the filesize is 4 bytes: 这就是我测试的方式,在下面的代码中,该函数说字符串大小为2个字节,但是当写入文件时,文件大小为4个字节:

Dim str As String = "Ñ"
Console.WriteLine(String.Format("Size of String : {0}", str.SizeInFile(Encoding.Unicode)))

File.WriteAllText(".\Test.txt", str, Encoding.Unicode)
Console.WriteLine(String.Format("Size of txtfile: {0}", New FileInfo(".\Test.txt").Length))

What am I missing to perform an efficient evaluation of the string size?. 我缺少有效评估字符串大小的内容吗?

In C# or VB.NET. 在C#或VB.NET中。

A file may begin with a byte order mark (called BOM) that helps the reader to detect what encoding was used. 文件可以以字节顺序标记(称为BOM)开头,该标记可以帮助读者检测所使用的编码。

The BOM for UTF8 is 3 bytes EF,BB,BF UTF8的BOM为3字节EF,BB,BF

For UTF16 (Encoding.Unicode) 2 bytes FEFF (encoded as either big endian or little endian depending on the encoding) 对于UTF16(Encoding.Unicode)2字节FEFF(根据编码方式编码为大端或小端)

For UTF32 4 bytes 0000FEFF 对于UTF32 4字节0000FEFF

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM