限制字符串的UTF-8编码字节长度

Question

我需要限制使用UTF-8编码编码的输出byte[]长度。 例如。 byte[]长度必须小于或等于1000首先，我编写了以下代码

            int maxValue = 1000;

            if (text.Length > maxValue)
                text = text.Substring(0, maxValue);
            var textInBytes = Encoding.UTF8.GetBytes(text);

如果字符串仅使用ASCII字符，则效果很好，因为每个字符1个字节。 但是，如果字符超出该范围，则每个字符可能是2或3甚至6个字节。 这将是上面的代码的问题。 为了解决这个问题，我写了这个。

            List<byte> textInBytesList = new List<byte>();
            char[] textInChars = text.ToCharArray();
            for (int a = 0; a < textInChars.Length; a++)
            {
                byte[] valueInBytes = Encoding.UTF8.GetBytes(textInChars, a, 1);
                if ((textInBytesList.Count + valueInBytes.Length) > maxValue)
                    break;

                textInBytesList.AddRange(valueInBytes);
            }

我尚未测试代码，但我确定它会按我的意愿工作。 但是，我不喜欢这样做，有没有更好的方法呢？ 我缺少什么？ 还是不知道？

谢谢。

Answer 1

我在Stack Overflow上的第一个帖子，请保持谨慎！ 这种方法应该可以为您很快地处理事情。

    public static byte[] GetBytes(string text, int maxArraySize, Encoding encoding) {
        if (string.IsNullOrEmpty(text)) return null;            

        int tail = Math.Min(text.Length, maxArraySize);
        int size = encoding.GetByteCount(text.Substring(0, tail));
        while (tail >= 0 && size > maxArraySize) {
            size -= encoding.GetByteCount(text.Substring(tail - 1, 1));
            --tail;
        }

        return encoding.GetBytes(text.Substring(0, tail));
    }

它与您正在执行的操作类似，但是没有List的额外开销，也不必每次都从字符串的开头算起。 我从字符串的另一端开始，当然假设所有字符必须至少一个字节。 因此，没有必要开始遍历字符串，而不是遍历maxArraySize（或字符串的总长度）。

然后，您可以像这样调用方法。

        byte[] bytes = GetBytes(text, 1000, Encoding.UTF8);

限制字符串的UTF-8编码字节长度

问题描述

1 个解决方案

解决方案1
1 2013-11-08 05:42:07

限制字符串的UTF-8编码字节长度

问题描述

1 个解决方案

解决方案1 1 2013-11-08 05:42:07

解决方案1
1 2013-11-08 05:42:07