简体   繁体   English

Swift 提取正则表达式匹配项

[英]Swift extract regex matches

I want to extract substrings from a string that match a regex pattern.我想从匹配正则表达式模式的字符串中提取子字符串。

So I'm looking for something like this:所以我正在寻找这样的东西:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {
   ???
}

So this is what I have:所以这就是我所拥有的:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {

    var regex = NSRegularExpression(pattern: regex, 
        options: nil, error: nil)

    var results = regex.matchesInString(text, 
        options: nil, range: NSMakeRange(0, countElements(text))) 
            as Array<NSTextCheckingResult>

    /// ???

    return ...
}

The problem is, that matchesInString delivers me an array of NSTextCheckingResult , where NSTextCheckingResult.range is of type NSRange .问题是, matchesInString为我提供了一个NSTextCheckingResult数组,其中NSTextCheckingResult.rangeNSRange类型。

NSRange is incompatible with Range<String.Index> , so it prevents me of using text.substringWithRange(...) NSRangeRange<String.Index>不兼容,所以它阻止我使用text.substringWithRange(...)

Any idea how to achieve this simple thing in swift without too many lines of code?知道如何在没有太多代码行的情况下在 swift 中实现这个简单的事情吗?

Even if the matchesInString() method takes a String as the first argument, it works internally with NSString , and the range parameter must be given using the NSString length and not as the Swift string length.即使matchesInString()方法将String作为第一个参数,它在内部也可以使用NSString ,并且必须使用NSString长度而不是 Swift 字符串长度来给出范围参数。 Otherwise it will fail for "extended grapheme clusters" such as "flags".否则对于“扩展字形簇”(例如“标志”)将失败。

As of Swift 4 (Xcode 9), the Swift standard library provides functions to convert between Range<String.Index> and NSRange .Swift 4 (Xcode 9) 开始,Swift 标准库提供了在Range<String.Index>NSRange之间转换的函数。

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:例子:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

Note: The forced unwrap Range($0.range, in: text)!注意:强制展开Range($0.range, in: text)! is safe because the NSRange refers to a substring of the given string text .是安全的,因为NSRange引用给定字符串text的子字符串。 However, if you want to avoid it then use但是,如果您想避免它,请使用

        return results.flatMap {
            Range($0.range, in: text).map { String(text[$0]) }
        }

instead.反而。


(Older answer for Swift 3 and earlier:) (Swift 3 及更早版本的旧答案:)

So you should convert the given Swift string to an NSString and then extract the ranges.因此,您应该将给定的 Swift 字符串转换为NSString ,然后提取范围。 The result will be converted to a Swift string array automatically.结果将自动转换为 Swift 字符串数组。

(The code for Swift 1.2 can be found in the edit history.) (Swift 1.2 的代码可以在编辑历史中找到。)

Swift 2 (Xcode 7.3.1) :斯威夫特 2(Xcode 7.3.1):

func matchesForRegexInText(regex: String, text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text,
                                            options: [], range: NSMakeRange(0, nsString.length))
        return results.map { nsString.substringWithRange($0.range)}
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:例子:

let string = "🇩🇪€4€9"
let matches = matchesForRegexInText("[0-9]", text: string)
print(matches)
// ["4", "9"]

Swift 3 (Xcode 8)斯威夫特 3 (Xcode 8)

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range)}
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:例子:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

My answer builds on top of given answers but makes regex matching more robust by adding additional support:我的答案建立在给定答案之上,但通过添加额外的支持使正则表达式匹配更加健壮:

  • Returns not only matches but returns also all capturing groups for each match (see examples below)不仅返回匹配项,还返回每个匹配项的所有捕获组(参见下面的示例)
  • Instead of returning an empty array, this solution supports optional matches此解决方案不返回空数组,而是支持可选匹配
  • Avoids do/catch by not printing to the console and makes use of the guard construct通过不打印到控制台来避免do/catch使用guard结构
  • Adds matchingStrings as an extension to String添加matchingStrings作为String的扩展

Swift 4.2斯威夫特 4.2

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.range(at: $0).location != NSNotFound
                    ? nsString.substring(with: result.range(at: $0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

Swift 3斯威夫特 3

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAt($0).location != NSNotFound
                    ? nsString.substring(with: result.rangeAt($0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

Swift 2斯威夫特 2

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matchesInString(self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAtIndex($0).location != NSNotFound
                    ? nsString.substringWithRange(result.rangeAtIndex($0))
                    : ""
            }
        }
    }
}

The fastest way to return all matches and capture groups in Swift 5在 Swift 5 中返回所有匹配项和捕获组的最快方法

extension String {
    func match(_ regex: String) -> [[String]] {
        let nsString = self as NSString
        return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, nsString.length)).map { match in
            (0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
        } ?? []
    }
}

Returns a 2-dimentional array of strings:返回一个二维字符串数组:

"prefix12suffix fix1su".match("fix([0-9]+)su")

returns...返回...

[["fix12su", "12"], ["fix1su", "1"]]

// First element of sub-array is the match
// All subsequent elements are the capture groups

If you want to extract substrings from a String, not just the position, (but the actual String including emojis).如果您想从字符串中提取子字符串,不仅仅是位置,(而是实际的字符串,包括表情符号)。 Then, the following maybe a simpler solution.那么,以下可能是一个更简单的解决方案。

extension String {
  func regex (pattern: String) -> [String] {
    do {
      let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
      let nsstr = self as NSString
      let all = NSRange(location: 0, length: nsstr.length)
      var matches : [String] = [String]()
      regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
        (result : NSTextCheckingResult?, _, _) in
        if let r = result {
          let result = nsstr.substringWithRange(r.range) as String
          matches.append(result)
        }
      }
      return matches
    } catch {
      return [String]()
    }
  }
} 

Example Usage:示例用法:

"someText 👿🏅👿⚽️ pig".regex("👿⚽️")

Will return the following:将返回以下内容:

["👿⚽️"]

Note using "\w+" may produce an unexpected ""注意使用 "\w+" 可能会产生意外的 ""

"someText 👿🏅👿⚽️ pig".regex("\\w+")

Will return this String array将返回此字符串数组

["someText", "️", "pig"]

I found that the accepted answer's solution unfortunately does not compile on Swift 3 for Linux.我发现不幸的是,接受的答案的解决方案无法在 Swift 3 for Linux 上编译。 Here's a modified version, then, that does:那么,这是一个修改后的版本:

import Foundation

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

The main differences are:主要区别在于:

  1. Swift on Linux seems to require dropping the NS prefix on Foundation objects for which there is no Swift-native equivalent. Linux 上的 Swift 似乎需要在 Foundation 对象上删除NS前缀,而没有 Swift 原生的等效对象。 (See Swift evolution proposal #86 .) (参见Swift 进化提案 #86 。)

  2. Swift on Linux also requires specifying the options arguments for both the RegularExpression initialization and the matches method. Linux 上的 Swift 还需要为正则RegularExpression初始化和matches方法指定options参数。

  3. For some reason, coercing a String into an NSString doesn't work in Swift on Linux but initializing a new NSString with a String as the source does work.出于某种原因,将String强制转换为NSString在 Linux 上的 Swift 中不起作用,但使用String初始化一个新的NSString作为源代码确实有效。

This version also works with Swift 3 on macOS / Xcode with the sole exception that you must use the name NSRegularExpression instead of RegularExpression .此版本也适用于 macOS / Xcode 上的 Swift 3,唯一的例外是您必须使用名称NSRegularExpression而不是RegularExpression

Swift 4 without NSString.没有 NSString 的 Swift 4。

extension String {
    func matches(regex: String) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: [.caseInsensitive]) else { return [] }
        let matches  = regex.matches(in: self, options: [], range: NSMakeRange(0, self.count))
        return matches.map { match in
            return String(self[Range(match.range, in: self)!])
        }
    }
}

@p4bloch if you want to capture results from a series of capture parentheses, then you need to use the rangeAtIndex(index) method of NSTextCheckingResult , instead of range . @p4bloch 如果要从一系列捕获括号中捕获结果,则需要使用 NSTextCheckingResult 的NSTextCheckingResult rangeAtIndex(index)方法,而不是range Here's @MartinR 's method for Swift2 from above, adapted for capture parentheses.这是上面的 @MartinR 用于 Swift2 的方法,适用于捕获括号。 In the array that is returned, the first result [0] is the entire capture, and then individual capture groups begin from [1] .在返回的数组中,第一个结果[0]是整个捕获,然后各个捕获组从[1]开始。 I commented out the map operation (so it's easier to see what I changed) and replaced it with nested loops.我注释掉了map操作(这样更容易看到我改变了什么)并用嵌套循环替换它。

func matches(for regex: String!, in text: String!) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text, options: [], range: NSMakeRange(0, nsString.length))
        var match = [String]()
        for result in results {
            for i in 0..<result.numberOfRanges {
                match.append(nsString.substringWithRange( result.rangeAtIndex(i) ))
            }
        }
        return match
        //return results.map { nsString.substringWithRange( $0.range )} //rangeAtIndex(0)
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

An example use case might be, say you want to split a string of title year eg "Finding Dory 2016" you could do this:一个示例用例可能是,假设您要拆分一串title year例如“Finding Dory 2016”,您可以这样做:

print ( matches(for: "^(.+)\\s(\\d{4})" , in: "Finding Dory 2016"))
// ["Finding Dory 2016", "Finding Dory", "2016"]

Most of the solutions above only give the full match as a result ignoring the capture groups eg: ^\d+\s+(\d+)上面的大多数解决方案只给出完全匹配,结果忽略了捕获组,例如:^\d+\s+(\d+)

To get the capture group matches as expected you need something like (Swift4) :要按预期获得捕获组匹配,您需要类似 (Swift4) 的内容:

public extension String {
    public func capturedGroups(withRegex pattern: String) -> [String] {
        var results = [String]()

        var regex: NSRegularExpression
        do {
            regex = try NSRegularExpression(pattern: pattern, options: [])
        } catch {
            return results
        }
        let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))

        guard let match = matches.first else { return results }

        let lastRangeIndex = match.numberOfRanges - 1
        guard lastRangeIndex >= 1 else { return results }

        for i in 1...lastRangeIndex {
            let capturedGroupIndex = match.range(at: i)
            let matchedString = (self as NSString).substring(with: capturedGroupIndex)
            results.append(matchedString)
        }

        return results
    }
}

This is how I did it, I hope it brings a new perspective how this works on Swift.我就是这样做的,我希望它能带来一个新的视角,它是如何在 Swift 上工作的。

In this example below I will get the any string between []在下面的这个例子中,我将得到[]之间的任何字符串

var sample = "this is an [hello] amazing [world]"

var regex = NSRegularExpression(pattern: "\\[.+?\\]"
, options: NSRegularExpressionOptions.CaseInsensitive 
, error: nil)

var matches = regex?.matchesInString(sample, options: nil
, range: NSMakeRange(0, countElements(sample))) as Array<NSTextCheckingResult>

for match in matches {
   let r = (sample as NSString).substringWithRange(match.range)//cast to NSString is required to match range format.
    println("found= \(r)")
}

This is a very simple solution that returns an array of string with the matches这是一个非常简单的解决方案,它返回一个包含匹配项的字符串数组

Swift 3.斯威夫特 3。

internal func stringsMatching(regularExpressionPattern: String, options: NSRegularExpression.Options = []) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regularExpressionPattern, options: options) else {
            return []
        }

        let nsString = self as NSString
        let results = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))

        return results.map {
            nsString.substring(with: $0.range)
        }
    }

basic phone number matching基本电话号码匹配

let phoneNumbers = ["+79990001101", "+7 (800) 000-11-02", "+34 507 574 147 ", "+1-202-555-0118"]

let match: (String) -> String = {
    $0.replacingOccurrences(of: #"[^\d+]"#, with: "", options: .regularExpression)
}

print(phoneNumbers.map(match))
// ["+79990001101", "+78000001102", "+34507574147", "+12025550118"]

Big thanks to Lars Blumberg his answer for capturing groups and full matches with Swift 4 , which helped me out a lot.非常感谢Lars Blumberg ,他回答了用Swift 4捕获组和完整匹配,这对我有很大帮助。 I also made an addition to it for the people who do want an error.localizedDescription response when their regex is invalid:当他们的正则表达式无效时,我还为那些确实想要 error.localizedDescription 响应的人添加了它:

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        do {
            let regex = try NSRegularExpression(pattern: regex)
            let nsString = self as NSString
            let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
            return results.map { result in
                (0..<result.numberOfRanges).map {
                    result.range(at: $0).location != NSNotFound
                        ? nsString.substring(with: result.range(at: $0))
                        : ""
                }
            }
        } catch let error {
            print("invalid regex: \(error.localizedDescription)")
            return []
        }
    }
}

For me having the localizedDescription as error helped understand what went wrong with escaping, since it's displays which final regex swift tries to implement.对我来说,将localizedDescription 作为错误有助于理解转义出了什么问题,因为它显示了最终的正则表达式 swift 尝试实现哪个。

update @Mike Chirico's to Swift 5将@Mike Chirico 更新为Swift 5

extension String{



  func regex(pattern: String) -> [String]?{
    do {
        let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options(rawValue: 0))
        let all = NSRange(location: 0, length: count)
        var matches = [String]()
        regex.enumerateMatches(in: self, options: NSRegularExpression.MatchingOptions(rawValue: 0), range: all) {
            (result : NSTextCheckingResult?, _, _) in
              if let r = result {
                    let nsstr = self as NSString
                    let result = nsstr.substring(with: r.range) as String
                    matches.append(result)
              }
        }
        return matches
    } catch {
        return nil
    }
  }
}

Update for iOS 16: Regex , RegexBuilder 👷‍♀️ iOS 16 更新: RegexRegexBuilder 👷‍♀️

Xcode previously supported Regex with the Find and Search tab. Xcode 以前通过Find and Search选项卡支持 Regex。 Many found Apple's NSRegularExpression s Swift API verbose and unwieldy, so Apple released Regex literal support and RegexBuilder this year.许多人发现 Apple 的NSRegularExpression的 Swift API 冗长且笨拙,因此 Apple 在今年发布了Regex literal支持和RegexBuilder

The API has been simplified going forward to tidy up complex String range-based parsing logic in iOS 16 / macOS 13 as well as improve performance.该 API 已被简化,以便在 iOS 16 / macOS 13 中整理复杂的基于String范围的解析逻辑并提高性能。

RegEx literals in Swift 5.7 Swift 5.7 中的正则表达式文字

func parseLine(_ line: Substring) throws -> MailmapEntry {

    let regex = /\h*([^<#]+?)??\h*<([^>#]+)>\h*(?:#|\Z)/

    guard let match = line.prefixMatch(of: regex) else {
        throw MailmapError.badLine
    }

    return MailmapEntry(name: match.1, email: match.2)
}

At the moment, we are able to match using prefixMatch or wholeMatch to find a single match, but the API may improve in the future for multiple matches.目前,我们可以使用prefixMatchwholeMatch进行匹配以找到单个匹配项,但未来 API 可能会针对多个匹配项进行改进。

RegexBuilder in Swift 5.7 Swift 5.7 中的正则表达式生成器

RegexBuilder is a new API released by Apple aimed at making RegEx code easier to write in Swift. RegexBuilder 是 Apple 发布的新 API,旨在使 RegEx 代码更容易在 Swift 中编写。 We can translate the Regex literal /\h*([^<#]+?)??\h*<([^>#]+)>\h*(?:#|\Z)/ from above into a more declarative form using RegexBuilder if we want more readability.我们可以将正则表达式文字/\h*([^<#]+?)??\h*<([^>#]+)>\h*(?:#|\Z)/从上面翻译成如果我们想要更多的可读性,请使用 RegexBuilder 更多的声明形式。

Do note that we can use raw strings in a RegexBuilder and also interleave Regex Literals in the builder if we want to balance readability with conciseness.请注意,如果我们想平衡可读性和简洁性,我们可以在 RegexBuilder 中使用原始字符串,也可以在构建器中交错 Regex Literals。

import RegexBuilder

let regex = Regex {
    ZeroOrMore(.horizontalWhitespace)
    Optionally {
        Capture(OneOrMore(.noneOf("<#")))
    }
        .repetitionBehavior(.reluctant)
    ZeroOrMore(.horizontalWhitespace)
    "<"
    Capture(OneOrMore(.noneOf(">#")))
    ">"
    ZeroOrMore(.horizontalWhitespace)
    /#|\Z/
}

The RegEx literal /£|\Z/ is equivalent to: RegEx 文字/£|\Z/等价于:

ChoiceOf {
   "#"
   Anchor.endOfSubjectBeforeNewline
}

Composable RegexComponent可组合RegexComponent

RegexBuilder syntax is similar to SwiftUI also in terms of composability because we can reuse RegexComponent s within other RegexComponent s: RegexBuilder语法在可组合性方面也类似于 SwiftUI,因为我们可以在其他RegexComponent中重用RegexComponent

struct MailmapLine: RegexComponent {
    @RegexComponentBuilder
    var regex: Regex<(Substring, Substring?, Substring)> {
        ZeroOrMore(.horizontalWhitespace)
        Optionally {
            Capture(OneOrMore(.noneOf("<#")))
        }
            .repetitionBehavior(.reluctant)
        ZeroOrMore(.horizontalWhitespace)
        "<"
        Capture(OneOrMore(.noneOf(">#")))
        ">"
        ZeroOrMore(.horizontalWhitespace)
        ChoiceOf {
           "#"
            Anchor.endOfSubjectBeforeNewline
        }
    }
}

You can use matching(regex:) on the string like:您可以在字符串上使用matching(regex:) ,例如:

let array = try "Your String To Search".matching(regex: ".")

using this simple extension:使用这个简单的扩展:

public extension String {
    func matching(regex: String) throws -> [String] {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: self, range: NSRange(startIndex..., in: self))
        return results.map { String(self[Range($0.range, in: self)!]) }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM