简体   繁体   中英

Javascript regular expression to search a string in word having multiple special characters (non space)

Requirement is to find if the search string is present in the given string with below conditions.

Condition 1 Search string should be found at the begin of the word ie, no special characters preceding it.

  • abc should match in string that begins with abc like abcdef any where in the sentence.

  • abc should NOT match in xabcdef should NOT match as it is not starting with 'abc'

Condition 2 If the string is preceded with some special character, then it should also have some text before special characters.

  • abc should match in test_abcdef - as 'abc' is preceded with 'test_'

  • abc should NOT match in _abcdef - as it is starting with '_' without any text before _

Below regular expression is not finding abc if string has multiple special characters ex in string test@_abcdef or test__abcdef .

In the regular expression not sure how to add quantifier in ' (?<=[A-Za-z0-9][^A-Za-z0-9])abc ' where [^A-Za-z0-9] is checking for SINGLE non alpha numeric character.

What is the syntax to add 0 or more special character in reqex (?<=... )

Regular Expression tried in Online Regex Tester

/^(?<![^A-Za-z0-9])abc|(?<=[A-Za-z0-9][^A-Za-z0-9])abc|(?<=\ )abc/g

Sample Text :

abcdef abcdef _abcdef xabcdef test_abcdef test__abcdef abc

You can apply all the assertions without alternation here:

/(?<![a-z0-9])(?<!^[^a-z0-9])(?<!\s[^a-z0-9])abc/igm

RegEx Demo

This regex has 3 assertions before matching abc :

  1. (?<![a-z0-9]) : Fail the match when previous character is not alphanumeric
  2. (?<!\\s[^a-z0-9]) : Fail the match when we have a non-alphanumeric character without preceding with some non-space character.
  3. (?<!^[^a-z0-9]) : Fail the match when we have a non-alphanumeric character at line start

Also note that lookbehind support in Javascript is still limited to new browsers only.

As regexes don't allow variable lenght lookbehind assertions, I don't think you can match just 'abc' but at the same time discard things like " _abc" " __abc" " ___abc", "____abc", etc.

I would suggest to do it in 2 steps:

First, try to match all required cases with a regexp without limiting yourself to just match 'abc'

(?:(?!abc[^a-zA-Z0-9\s]+)[a-zA-Z0-9]+[^a-zA-Z0-9\s]+|^|\s)(abc)

https://regex101.com/r/bAo05D/3

Then, just recalculate abc index with: abc_index = whole_regexp_index + length(regexp_matched_string) - length(abc)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM