regex lookbehind alternative for parser (js)

Question

Good morning

(I saw this topic has a LOT of answers but I couldn't find one that fits)

I am writing a little parser in javascript that would cut the text into sections like this :

var tex = "hello   this :word is apart"

var parsed = [
  "hello",
  "   ",
  "this",
  " ",
  // ":word" should not be there, neither "word"
  " ",
  "is",
  "apart"
]

the perfect regex for this is :

/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g

But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility ...

I considered :

trying capturing groups (?:) but it consumes the space before...
just removing the spaces-check, but ":word" comes in as "word"
parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain

Understand, I NEED words AND ALL spaces, and to exclude some words. I am open in other methods, like not using regex.

my last option :

removing the spaces-check and organising my whole regex in the right order , praying that ":word" would be kept in the "special words" group before anything else.

my question :

would that work in javascript, and be reliable ?

I tried

/(((:[a-z]+)|([ ]+)|([a-z]*))/g

in https://regexr.com/ seems to work, will it work in every case ?

Answer 1

You said you're open to non-regex solutions, but I can give you one that includes both. Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.

 const text = 'hello this :word is apart'; const regex = /(\\w+)|(:\\w+)|(\\s+)/g; const parsed = text.match(regex).filter(word => !word.includes(':')); console.log(parsed);

Answer 2

I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string , this is the simple regex:

/:\w+/g

Then replace with an empty string . Now you have a string, that can be parsed with this regex:

/([ ]+)|([a-z]*)/g

which is a simplified version of your second regex, since forbidden Words are already gone.

regex lookbehind alternative for parser (js)

Question

Good morning

my last option :

my question :

2 answers

solution1
1 ACCPTED 2018-11-17 03:44:00

solution2
1 2018-11-17 03:58:39

regex lookbehind alternative for parser (js)

Question

Good morning

my last option :

my question :

2 answers

solution1 1 ACCPTED 2018-11-17 03:44:00

solution2 1 2018-11-17 03:58:39

solution1
1 ACCPTED 2018-11-17 03:44:00

solution2
1 2018-11-17 03:58:39