如何根据字节大小限制拆分包含非 ascii 字符的字符串？

Question

How to split a string containing non ascii characters based on the byte size limit?如何根据字节大小限制拆分包含非 ascii 字符的字符串？ I want to split the below string and add to a List and the split is based on the size limit (eg) 3 bytes.我想拆分以下字符串并添加到列表中，拆分基于大小限制（例如）3 个字节。

The problem here is extended ascii char takes 2 characters and after split the data become junk as shown in the actual output.这里的问题是扩展 ascii char 需要 2 个字符，拆分后数据变成垃圾，如实际 output 所示。

what I want is the expected output as given below, its ok to write only 2 bytes, if we come across non-ascii char.我想要的是预期的 output 如下所示，如果我们遇到非 ascii 字符，可以只写 2 个字节。 Please let me know how to resolve it.请让我知道如何解决它。 Problem:问题：

String words = "Hello woræd  æåéøòôóâ";
        List<String> payloads = new ArrayList<>();
        try( ByteArrayOutputStream outStream = new ByteArrayOutputStream();) {
            byte[] chars = words.getBytes(StandardCharsets.UTF_8);
             for (byte ch: chars) {
                 outStream.write(ch);
                 if (outStream.size() >= 3) {
                     String s = outStream.toString("UTF-8");
                     payloads.add(s);
                     outStream.flush();
                     outStream.reset();
                 }
             }
            payloads.add(outStream.toString("UTF-8"));
            outStream.flush();
            System.out.println(payloads);
        } catch (IOException e) {
            e.printStackTrace();
        }

Actual Output: [Hel, lo, wor, æd, �, �å, é�, �ò, ô�, �â, ]实际 Output: [Hel, lo, wor, æd, �, �å, é�, �ò, ô�, �â, ]

Expected output: [Hel, lo, wor, æd, ,æ, å, é, ø, ò, ô, ó, â] ]预期 output： [Hel, lo, wor, æd, ,æ, å, é, ø, ò, ô, ó, â] ]

Answer 1

It's UTF-8.它是 UTF-8。 UTF-8 is designed so that you can easlly detect character boundaries. UTF-8 旨在让您可以轻松检测字符边界。

So: convert String to UTF-8 bytes.所以：将字符串转换为 UTF-8 字节。

Then backtrack until the first excluded byte is a legitimate 'first byte', ie not 10xxxxxx.然后回溯，直到第一个排除的字节是合法的“第一个字节”，即不是10xxxxxx。 You are now positioned at a character boundary.您现在位于字符边界。

如何根据字节大小限制拆分包含非 ascii 字符的字符串？

问题描述

1 个解决方案

解决方案1
0 2021-11-30 17:55:56

如何根据字节大小限制拆分包含非 ascii 字符的字符串？

问题描述

1 个解决方案

解决方案1 0 2021-11-30 17:55:56

解决方案1
0 2021-11-30 17:55:56