是否有一种简单的方法来拥有utf8编码字符串的子字符串，该子字符串的repr长度小于python中的N

Question

for example i have a string, I hope find a easy way to get a substring, which encode in utf-8, and the length of the repr of the substring is <= N, of course i can try N/3 substring and increase N/3+1, N/3+2,...,but if there is a easy way? 例如我有一个字符串，我希望找到一种简单的方法来获取以utf-8编码的子字符串，并且该子字符串的repr的长度为<= N，当然我可以尝试使用N / 3子字符串并增加N / 3 + 1，N / 3 + 2，...，但是有没有简单的方法？

word = u"this is a ship, and some other words".encode("utf-8")
#some way got a substring
substring = func(word, N)
#assert len(repr(substring)) <= N

Thanks! 谢谢！

Answer 1

A possible approach: 可能的方法：

Take first N-1 bytes of the repr of the whole string. 取整个字符串的repr的前N-1个字节。
Examine last 3 bytes to see if you broke an escape sequence and cut of bytes if necessary 检查最后3个字节，以查看是否中断了转义序列并在必要时削减了字节
Append a quote, keeping in mind that it may be ' or " . 请加上引号，并记住它可能是'或" 。
Eval the repr back to utf-8. 评估代表回到utf-8。
Examine the last few bytes to see if you broke the string in the middle of a Unicode code point and cut out bytes if necessary. 检查最后几个字节，看看是否在Unicode代码点的中间中断了字符串，并在必要时切出了字节。 You can tell apart leading bytes and continuation bytes by examining the bit pattern. 您可以通过检查位模式来区分前导字节和连续字节。

是否有一种简单的方法来拥有utf8编码字符串的子字符串，该子字符串的repr长度小于python中的N

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-04-29 07:14:47

是否有一种简单的方法来拥有utf8编码字符串的子字符串，该子字符串的repr长度小于python中的N

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-04-29 07:14:47

解决方案1
1 已采纳 2013-04-29 07:14:47