简体   繁体   中英

iOS - regex to match word boundary, including underscore

I have a regex that I'm trying to run to match a variety of search terms. For example:

the search "old" should match: -> age_old -> old_age but not -> bold - as it's not at the start of the word

To do this, I was using a word boundary. However, word boundary doesn't take into account underscores. As mentioned here , there are work arounds available in other languages. Unfortunately, with NSRegularExpression, this doesn't look possible. Is there any other way to get a word boundary to work? Or other options?

Swift and Objective C support ICU regex flavor . This flavor supports look-behinds of fixed and constrained width.

(?= ... ) Look-ahead assertion . True if the parenthesized pattern matches at the current input position, but does not advance the input position.

(?! ... ) Negative look-ahead assertion . True if the parenthesized pattern does not match at the current input position. Does not advance the input position.

(?<= ... ) Look-behind assertion . True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators.)

(?<! ... ) Negative Look-behind assertion.

So, you can use

 let regex = "(?<![\\p{L}\\d])old(?![\\p{L}\\d])";

See regex demo

Here is a Swift code snippet extracting all "old"s:

func matchesForRegexInText(regex: String, text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text,
            options: [], range: NSMakeRange(0, nsString.length))
        return results.map { nsString.substringWithRange($0.range)}
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

let s = "age_old -> old_age but not -> bold"
let rx = "(?<![\\p{L}\\d])old(?![\\p{L}\\d])"
let matches = matchesForRegexInText(rx, text: s)
print(matches) // => ["old", "old"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM