简体   繁体   中英

How to match '+abc' but not '++abc' without lookbehind?

In a sentence similar to:

Lorem ipsum +dolor ++sit amet.

I'd like to match the +dolor but not the ++sit . I can do it with a lookbehind but since JavaScript does not support it I'm struggling to build a pattern for it.

So far I've tried it with:

(?:\+(.+?))(?=[\s\.!\!]) - but it matches both words
(?:\+{1}(.+?))(?=[\s\.!\!]) - the same here - both words are matched

and to my surprise a pattern like:

(?=\s)(?:\+(.+?))(?=[\s\.!\!])

doesn't match anything. I thought I can trick it out and use the \\s or later also the ^ before the + sign but it doesn't seem to work like that.


EDIT - background information:

It's not necessarily part of the question but sometimes it's good to know what is this all good for so to clarify some of your questions/comments a short explanation:

  • any word in any order can by marked by either a + or a ++
  • each word and it's marking will be replaced by a <span> later
  • cases like lorem+ipsum are concidered to be invalid because it would be like splitting a word (ro+om) or writing two words together as one word (myroom) so it has to be corrected anyway (the pattern can match this but it's not an error) it should however at least match the normal cases like in the example above
  • I use a lookahead like (?=[\\s\\.!\\!]) so that I can match words in any language an not only \\w 's characters

One way would be to match one additional character and ignore that (by putting the relevant part of the match into a capturing group):

(?:^|[^+])(\+[^\s+.!]+)

However, this breaks down if potential matches could be directly adjacent to each other.

Test it live on regex101.com .

Explanation:

(?:         # Match (but don't capture)
 ^          # the position at the start of the string
|           # or
 [^+]       # any character except +.
)           # End of group
(           # Match (and capture in group 1)
 \+         # a + character
 [^\s+.!]+  # one or more characters except [+.!] or whitespace.
)           # End of group
\+\+|(\+\S+)

Grab the content from capturing group 1. The regex uses the trick described in this answer .

Demo on regex101

var re = /\+\+|(\+\S+)/g;
var str = 'Lorem ipsum +dolor ++sit ame';
var m;
var o = [];

while ((m = re.exec(str)) != null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }

    if (m[1] != null) {
        o.push(m[1]);
    }

}

If you have input like +++donor , use:

\+\++|(\+\S+)

The following regex seems to be working for me:

var re = / (\+[a-zA-Z0-9]+)/  // Note the space after the '/'

Demo

https://www.regex101.com/r/uQ3wE7/1

我认为这就是你所需要的。

(?:^|\s)(\+[^+\s.!]*)(?=[\s.!])

试试下面的正则表达式:

(^|\s)\+\w+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM