简体   繁体   English

subtring一个多字节字符c#

[英]substring a multibyte character safely c#

I'm trying to do a substring on a string containing multi byte characters, and I'm not getting the results I expect. 我正在尝试对包含多字节字符的字符串执行子字符串,但我没有得到我期望的结果。 I am trying to substring strings like 😂test. 我试图像😂test那样对字符串进行子串。 The first character is a 4 byte character so calling ToCharArray on this string returns: 第一个字符是一个4字节字符,因此在此字符串上调用ToCharArray返回:

  • 55357 #bytes 1 and 2 of the first character 55357 #bytes第一个字符的1和2
  • 56384 #bytes 3 and 4 of the first character 56384 #bytes第一个字符的3和4
  • 116 #t 116 #t
  • 101 #e 101 #e
  • 115 #s 115 #s
  • 116 #t 116 #t

So when I call .Substring(1) on this string, it returns an invalid string that starts with the third and fourth bytes of the first character, not 'test'. 所以当我在这个字符串上调用.Substring(1)时,它会返回一个无效的字符串,该字符串以第一个字符的第三个和第四个字节开头,而不是'test'。 Is there any way to get .Substring and other string operations to treat that character as a single unit? 有没有办法让.Substring和其他字符串操作将该字符视为一个单元?

You want to use StringInfo 您想使用StringInfo

        var yourstring = "😂test";
    StringInfo si = new StringInfo(yourstring);
    var substring = si.SubstringByTextElements(1);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM