简体   繁体   English

如何在C#中转换为字节时截断字符串?

[英]How do I truncate a string while converting to bytes in C#?

I would like to put a string into a byte array, but the string may be too big to fit. 我想将一个字符串放入一个字节数组,但字符串可能太大而不适合。 In the case where it's too large, I would like to put as much of the string as possible into the array. 在它太大的情况下,我想尽可能多地将字符串放入数组中。 Is there an efficient way to find out how many characters will fit? 有没有一种有效的方法可以找出适合的字符数量?

In order to truncate a string to a UTF8 byte array without splitting in the middle of a character I use this: 为了将字符串截断为UTF8字节数组而不在字符中间分割,我使用:

static string Truncate(string s, int maxLength) {
    if (Encoding.UTF8.GetByteCount(s) <= maxLength)
        return s;
    var cs = s.ToCharArray();
    int length = 0;
    int i = 0;
    while (i < cs.Length){
        int charSize = 1;
        if (i < (cs.Length - 1) && char.IsSurrogate(cs[i]))
            charSize = 2;
        int byteSize = Encoding.UTF8.GetByteCount(cs, i, charSize);
        if ((byteSize + length) <= maxLength){
            i = i + charSize;
            length += byteSize;
        }
        else
            break;
    }
    return s.Substring(0, i);
}

The returned string can then be safely transferred to a byte array of length maxLength. 然后可以将返回的字符串安全地传输到长度为maxLength的字节数组。

You should be using the Encoding class to do your conversion to byte array correct? 您应该使用Encoding类来转换为字节数组是否正确? All Encoding objects have an overridden method GetMaxCharCount, which will give you "The maximum number of characters produced by decoding the specified number of bytes." 所有Encoding对象都有一个重写方法GetMaxCharCount,它将为您提供“通过解码指定字节数产生的最大字符数”。 You should be able to use this value to trim your string and properly encode it. 您应该能够使用此值修剪字符串并对其进行正确编码。

Efficient way would be finding how much (pessimistically) bytes you will need per character with 有效的方法是找出每个角色需要多少(悲观)字节

Encoding.GetMaxByteCount(1);

then dividing your string size by the result, then converting that much characters with 然后将你的字符串大小除以结果,然后转换那么多字符

public virtual int Encoding.GetBytes (
 string s,
 int charIndex,
 int charCount,
 byte[] bytes,
 int byteIndex
)

If you want to use less memory use 如果你想使用更少的内存

Encoding.GetByteCount(string);

but that is a much slower method. 但这是一个慢得多的方法。

The Encoding class in .NET has a method called GetByteCount which can take in a string or char[]. .NET中的Encoding类有一个名为GetByteCount的方法,它可以接受字符串或char []。 If you pass in 1 character, it will tell you how many bytes are needed for that 1 character in whichever encoding you are using. 如果传入1个字符,它将告诉您在使用的任何编码中,该1个字符需要多少字节。

The method GetMaxByteCount is faster, but it does a worst case calculation which could return a higher number than is actually needed. 方法GetMaxByteCount更快,但它执行最坏情况计算,可能返回比实际需要更高的数字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM