简体   繁体   English

正则表达式在 Swift 中不起作用,但在其他语言中起作用

[英]Regular expression doesn't work in Swift, but work in other languages

I know that NSRegularExpression works on Unicode code points and (normal) JavaScript regex works on UTF-16 code units, but I don't know what should I change in my regex.我知道NSRegularExpression适用于 Unicode 代码点和(正常)JavaScript 正则表达式适用于 UTF-16 代码单元,但我不知道我应该在我的正则表达式中更改什么。

Regex: <text[^>]+>([^<]+)<\/text>正则表达式: <text[^>]+>([^<]+)<\/text>

Works here: regex101在这里工作: regex101

My parsing method:我的解析方法:

func parseCaptions(text: String) -> String? {
        let textRange = NSRange(location: 0, length: text.count)
        let regex = try! NSRegularExpression(pattern: "<text[^>]+>([^<]+)<\\/text>")
        let matches = regex.matches(in: text, range: textRange)
        
        var result: String?
        
        for match in matches {
            let range = match.range
            
            let first = text.index(text.startIndex, offsetBy: range.location)
            let last = text.index(text.startIndex, offsetBy: range.location + range.length)
            
            var string = String(text[first...last])
            
            string = string.replacingOccurrences(of: "\n", with: " ")
            string = string.replacingOccurrences(of: "&amp;#39;", with: "'")
            string = string.replacingOccurrences(of: "&amp;quot;", with: "\"")
            string.append("\n")
            
            result = string
        }
        
        return result
    }

It's not the Regex the issue, it's what you do with the matches.这不是正则表达式的问题,而是你对比赛所做的事情。

You do:你做:

var result: String?

for match in matches {
    let range = match.range
    let first = text.index(text.startIndex, offsetBy: range.location)
    let last = text.index(text.startIndex, offsetBy: range.location + range.length)

    var string = String(text[first...last])
    ...
    result = string
}
return result

So you're overwriting each time result with the last match.所以你用最后一场比赛覆盖每次result

A solution:一个解法:

func parseCaptions(text: String) -> String {
    //NSRange, based on NSString use UTF16 for counting, while Swift.String use UTF8 by default, so `text.count` might be wrong
    let textRange = NSRange(location: 0, length: text.utf16.count)
    let regex = try! NSRegularExpression(pattern: "<text[^>]+>([^<]+)<\\/text>")
    let matches = regex.matches(in: text, range: textRange)

    var result: String = ""
    for match in matches {
        let textNSRange = match.range(at: 1)
        let textRange = Range(textNSRange, in: text)!
        var string = String(text[textRange])
        string = string.replacingOccurrences(of: "\n", with: " ")
        string = string.replacingOccurrences(of: "&#39;", with: "'")
        string = string.replacingOccurrences(of: "&amp;quot;", with: "\"")
        string.append("\n")
        result.append(string)
    }
    return result
}

So, with input:因此,输入:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<transcript>
<text start="9.462" dur="1.123">Aaaah</text>
<text start="70.507" dur="5.51">So guys, apparently we control Rewind this year.</text>
<text start="76.017" dur="4.842">
Y&#39;all we can do whatever we want. What do we do?
</text>
</transcript>

We get:我们得到:

Aaaah
So guys, apparently we control Rewind this year.
 Y'all we can do whatever we want. What do we do? 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM