简体   繁体   English

如何使用Swift正则表达式捕获Unicode字符

[英]How do I capture unicode character with swift regex

I have a String in Swift which looks as follows in the debugger of Xcode 我在Swift中有一个String,在Xcode的调试器中看起来如下

Random Text: \\u{e2}specificText: 随机文字:\\ u {e2} specificText:

When I print the text in the console of Xcode it looks like 当我在Xcode控制台中打印文本时,它看起来像

Random Text: ‎ specificText: 随机文字:specificText:

If I paste the text in question in some editor it looks like a bold dot. 如果我在某些编辑器中粘贴有问题的文本,它看起来像一个粗体点。

Which regular expression do I have to use to just capture \\u{e2} in the above text? 我必须使用哪个正则表达式捕获以上文本中的\\u{e2} Which unicode character is that? 那是哪个unicode字符?

I am using the following String extension to get the captured groups: 我正在使用以下String扩展名来获取捕获的组:

extension String {
  func capturedGroups(forRegex regex: String) -> [String]? {
    guard let expression = try? NSRegularExpression(pattern: regex) else { return nil }
    let matches = expression.matches(in: self, options: [], range: NSRange(location:0, length: (self as NSString).count))
    guard let match = matches.first else { return nil }
    let lastRangeIndex = match.numberOfRanges - 1
    guard lastRangeIndex >= 1 else { return nil }
    var results = [String]()
    for i in 1...lastRangeIndex {
        let capturedGroupIndex = match.range(at: i)
        let matchedString = (self as NSString).substring(with: capturedGroupIndex)
        results.append(matchedString)
    }
    return results
  }
}

I have tried the following but it did not work 我尝试了以下方法,但是没有用

snippet.capturedGroups(forRegex: "(\\u00e2)")

I debugged the strings containing \\u{e2} with Xcode with the following code: 我用Xcode用以下代码调试了包含\\u{e2}的字符串:

snippet.characters.forEach { character in
    print(character)
}

After setting a breakpoint at the print line I found out that although the Xcode debugger shows the following Unicode characters as \\u{e2} when looking at the string the characters which I was actually confronted with were print行设置断点后,我发现尽管Xcode调试器在查看字符串时将以下Unicode字符显示为\\u{e2} ,但实际上我遇到的字符是

https://unicode-table.com/en/200E/ https://unicode-table.com/zh/200E/

https://unicode-table.com/en/202A/ https://unicode-table.com/zh/202A/

https://unicode-table.com/en/202C/ https://unicode-table.com/zh/202C/

I could capture the unicode characters with the following code with my extension outlined in the question above: 我可以使用以下代码捕获unicode字符,并在上面的问题中概述了我的扩展名:

snippet.capturedGroups(forRegex: "([\\u200E]{1})")
snippet.capturedGroups(forRegex: "([\\u202A]{1})")
snippet.capturedGroups(forRegex: "([\\u202C]{1})")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM