[英]How do I remove hidden characters from a NSString?
After copying pasting a text from the web, in my mac app NSTextArea, I see 从网上复制粘贴文本后,在我的Mac应用程序NSTextArea中,我看到了
EE
If I copy these 2 letters in a browser I see: 如果在浏览器中复制这两个字母,我会看到:
E?E
If I copy them in google translator I get 如果我在Google翻译中复制它们,我会得到
E 'E
I cannot identify this character in between the two E. But the question is: how do I remove these hidden characters from my NSString? 我无法在两个E之间识别此字符。但是问题是:如何从NSString中删除这些隐藏的字符?
In your uploaded file the specific hex code for the hidden character is 0x18 在您上传的文件中,隐藏字符的特定十六进制代码为0x18
(found via Hex Fiend) (通过Hex Fiend找到)
This character, along with others are part of a ' control character set'. 该字符以及其他字符是“ 控制字符集”的一部分。 The set also contains characters such as the tab (0x09) and newline (0x0A) - obviously those we don't want to remove.
该集合还包含诸如制表符(0x09)和换行符(0x0A)之类的字符-显然我们不想删除这些字符。
In Objective-C, we can use the NSCharacterSet controlCharacterSet in conjunction with whitespaceAndNewlineCharacterSet to get just the blank characters that have no rendered width. 在Objective-C,我们可以使用NSCharacterSet controlCharacterSet与whitespaceAndNewlineCharacterSet一起得到的只是那些没有呈现宽度的空白字符。
NSMutableCharacterSet* zeroWidthCharacterSet = [[NSCharacterSet controlCharacterSet] mutableCopy];
[zeroWidthCharacterSet formIntersectionWithCharacterSet:[[NSCharacterSet whitespaceAndNewlineCharacterSet] invertedSet]];
Then we can simply use the good old split by character set method 然后,我们可以简单地使用按字符集分割的旧方法
string = [[string componentsSeparatedByCharactersInSet:zeroWidthCharacterSet] componentsJoinedByString:@""];
Note that if a special character that uses more than one UTF8 character to represent itself (like Emoji) uses 0x18 then stripping it will break the character combo 请注意,如果使用多个UTF8字符表示自己的特殊字符(如Emoji表情)使用0x18,则剥离该字符会破坏该字符组合
Because the control characters are special, I don't believe you'd ever find them in an Emoji sequence. 因为控制字符很特殊,所以我相信您永远不会在表情符号序列中找到它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.