解析器（js）的正则表达式后向替代

Question

Good morning 早上好

(I saw this topic has a LOT of answers but I couldn't find one that fits) （我看到这个主题有很多答案，但找不到合适的答案）

I am writing a little parser in javascript that would cut the text into sections like this : 我正在用javascript写一个小解析器，它将文本切成这样的部分：

var tex = "hello   this :word is apart"

var parsed = [
  "hello",
  "   ",
  "this",
  " ",
  // ":word" should not be there, neither "word"
  " ",
  "is",
  "apart"
]

the perfect regex for this is : 完美的正则表达式是：

/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g

But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility ... 但是，正如我所读到的那样，它具有积极的后盾 ，正如我在2018年仅在javascript中实现的那样，所以我猜想许多浏览器兼容性冲突......我希望它至少具有一点兼容性 ...

I considered : 我考虑过：

trying capturing groups (?:) but it consumes the space before... 尝试捕获组（？:)，但是在...之前会占用空间。
just removing the spaces-check, but ":word" comes in as "word" 只是删除空格检查，但是“：word”是以“ word”的形式出现的
parsing the text 2 times, one for words, the other for spaces, but i fear putting them in the right order would be a pain 解析文本2次，一次为单词，另一次为空格，但我担心将它们按正确的顺序放置会很麻烦

Understand, I NEED words AND ALL spaces, and to exclude some words. 明白了，我需要单词和所有空格，并排除一些单词。 I am open in other methods, like not using regex. 我对其他方法持开放态度，例如不使用正则表达式。

my last option : 我最后的选择：

removing the spaces-check and organising my whole regex in the right order , praying that ":word" would be kept in the "special words" group before anything else. 删除空格检查并按正确的顺序组织我的整个正则表达式，祈祷“：word”在其他任何内容之前都保留在“特殊单词”组中。

my question : 我的问题：

would that work in javascript, and be reliable ? 可以在javascript中工作，并且可靠吗？

I tried 我试过了

/(((:[a-z]+)|([ ]+)|([a-z]*))/g

in https://regexr.com/ seems to work, will it work in every case ? 在https://regexr.com/中似乎可以正常工作，在每种情况下都可以工作吗？

Answer 1

You said you're open to non-regex solutions, but I can give you one that includes both. 您说过您可以使用非正则表达式解决方案，但我可以给您一个包括这两种解决方案的解决方案。 Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon. 由于您不能依靠幕后支持，因此只需捕获所有内容并过滤掉不需要的内容即可，单词后跟冒号。

 const text = 'hello this :word is apart'; const regex = /(\\w+)|(:\\w+)|(\\s+)/g; const parsed = text.match(regex).filter(word => !word.includes(':')); console.log(parsed);

Answer 2

I would use 2 regexes, first one matches the Words, you DON'T want and then replace them with an empty string , this is the simple regex: 我将使用2个正则表达式，第一个与单词匹配，您不想要，然后replace它们replace为empty string ，这是简单的正则表达式：

/:\w+/g

Then replace with an empty string . 然后replace为empty string 。 Now you have a string, that can be parsed with this regex: 现在您有了一个字符串，可以使用此正则表达式进行解析：

/([ ]+)|([a-z]*)/g

which is a simplified version of your second regex, since forbidden Words are already gone. 这是您第二个正则表达式的简化版本，因为禁止的单词已经消失了。

解析器（js）的正则表达式后向替代

问题描述

Good morning 早上好

my last option : 我最后的选择：

my question : 我的问题：

2 个解决方案

解决方案1
1 已采纳 2018-11-17 03:44:00

解决方案2
1 2018-11-17 03:58:39

解析器（js）的正则表达式后向替代

问题描述

Good morning 早上好

my last option : 我最后的选择 ：

my question : 我的问题 ：

2 个解决方案

解决方案1 1 已采纳 2018-11-17 03:44:00

解决方案2 1 2018-11-17 03:58:39

my last option : 我最后的选择：

my question : 我的问题：

解决方案1
1 已采纳 2018-11-17 03:44:00

解决方案2
1 2018-11-17 03:58:39