[英]Regular expression doesn't work in Swift, but work in other languages
I know that NSRegularExpression
works on Unicode code points and (normal) JavaScript regex works on UTF-16 code units, but I don't know what should I change in my regex.我知道
NSRegularExpression
适用于 Unicode 代码点和(正常)JavaScript 正则表达式适用于 UTF-16 代码单元,但我不知道我应该在我的正则表达式中更改什么。
Regex: <text[^>]+>([^<]+)<\/text>
正则表达式:
<text[^>]+>([^<]+)<\/text>
Works here: regex101在这里工作: regex101
My parsing method:我的解析方法:
func parseCaptions(text: String) -> String? {
let textRange = NSRange(location: 0, length: text.count)
let regex = try! NSRegularExpression(pattern: "<text[^>]+>([^<]+)<\\/text>")
let matches = regex.matches(in: text, range: textRange)
var result: String?
for match in matches {
let range = match.range
let first = text.index(text.startIndex, offsetBy: range.location)
let last = text.index(text.startIndex, offsetBy: range.location + range.length)
var string = String(text[first...last])
string = string.replacingOccurrences(of: "\n", with: " ")
string = string.replacingOccurrences(of: "&#39;", with: "'")
string = string.replacingOccurrences(of: "&quot;", with: "\"")
string.append("\n")
result = string
}
return result
}
It's not the Regex the issue, it's what you do with the matches.这不是正则表达式的问题,而是你对比赛所做的事情。
You do:你做:
var result: String?
for match in matches {
let range = match.range
let first = text.index(text.startIndex, offsetBy: range.location)
let last = text.index(text.startIndex, offsetBy: range.location + range.length)
var string = String(text[first...last])
...
result = string
}
return result
So you're overwriting each time result
with the last match.所以你用最后一场比赛覆盖每次
result
。
A solution:一个解法:
func parseCaptions(text: String) -> String {
//NSRange, based on NSString use UTF16 for counting, while Swift.String use UTF8 by default, so `text.count` might be wrong
let textRange = NSRange(location: 0, length: text.utf16.count)
let regex = try! NSRegularExpression(pattern: "<text[^>]+>([^<]+)<\\/text>")
let matches = regex.matches(in: text, range: textRange)
var result: String = ""
for match in matches {
let textNSRange = match.range(at: 1)
let textRange = Range(textNSRange, in: text)!
var string = String(text[textRange])
string = string.replacingOccurrences(of: "\n", with: " ")
string = string.replacingOccurrences(of: "'", with: "'")
string = string.replacingOccurrences(of: "&quot;", with: "\"")
string.append("\n")
result.append(string)
}
return result
}
So, with input:因此,输入:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<transcript>
<text start="9.462" dur="1.123">Aaaah</text>
<text start="70.507" dur="5.51">So guys, apparently we control Rewind this year.</text>
<text start="76.017" dur="4.842">
Y'all we can do whatever we want. What do we do?
</text>
</transcript>
We get:我们得到:
Aaaah
So guys, apparently we control Rewind this year.
Y'all we can do whatever we want. What do we do?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.