简体   繁体   中英

Updating an NSRegularExpression to be a specific pattern

I have a NSString pattern like so:

NSString *pattern = @"@[A-Za-z0-9]+";
    NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:nil];

This pattern shows all the matches that start with @ and have at least one alphanumeric character after it.

How do I adopt this so that the pattern matches all alphanumeric characters, _ or - and start and end with an alphanumeric character?

Some examples are:

@a
@0
@a-z
@hello
@ab_z9

Some edge cases are:

If it is @Liam_O'Flaherty then I want it to match to @Liam_O
Or
If it is @a- then I want it to match to @a

Try this regex:

@"@[a-zA-Z0-9](?:(?:[A-Za-z0-9-_]*[a-zA-Z0-9])|)"

The first bracket groups the alphanumeric character, the second matches alphanumeric and the - and _ , and the last matches alphanumeric at the end of the word. The * means that we can have any or none of the second bracket group, the (?:) parentheses create situations for Regex to match but not create backreferences/match groups, and the | means OR... So we can have an alphanumeric character, and then either some 0+ number of alphanumerics, - , and _ , followed by another alphanumeric, or nothing. (as nothing follows the or)

PS Not quite sure in your question if you need the opening @ or not. If not, take it out...

I would consider something like the following:

@(?=[A-Za-z0-9])[A-Za-z0-9-_]+(?<=[A-Za-z0-9])

The constituent parts of this are:

  • The @ followed later by the [A-Za-z0-9-_]+ is the heart of the search, matching any string with 1 or more alpha numeric characters, hyphens or underscores.

  • The look-ahead assertion at the start, (?=[A-Za-z0-9]) , means "but it must start with alphanumeric."

  • The look-behind assertion at the end, (?<=[A-Za-z0-9]) , means "and it must end with alpha numeric."

This raises a few edge-case questions, namely:

  • What do you want to do with accents? If you wanted to handle accented characters, such as @naïve or @resumé , you might want to use \\p{L} rather than A-Za-z . (And if you put this in a string in your code, you need to escape the backslash, so that would be represented with \\\\p{L} .)

  • What do you want to do if there is an non-alphanumeric character in the string, for example @this.is.wrong or @Liam_O'Flaherety . Or what do you want to do if it does not end in alpha numeric, eg @a- . The above regex (as well as the regex presented in other answers) would match up to the invalid character (eg @this , @Liam_O , and @a , respectively). This doesn't seem like it could possibly be the right handling of this scenario. Personally, I would be inclined to further qualify the regex to exclude these cases, but without a broader description of your business problem, it's hard to say what's right in this case.

    Having said that, I would wager you might not concerned with this exception, so this flaw in the regex might not be of concern to you. But if you are, let us know what the edge-cases are and how you'd like to handle them and we can be more specific in our answers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM