[英]regex lookbehind alternative for parser (js)
(I saw this topic has a LOT of answers but I couldn't find one that fits) (我看到这个主题有很多答案,但找不到合适的答案)
I am writing a little parser in javascript that would cut the text into sections like this : 我正在用javascript写一个小解析器,它将文本切成这样的部分:
var tex = "hello this :word is apart"
var parsed = [
"hello",
" ",
"this",
" ",
// ":word" should not be there, neither "word"
" ",
"is",
"apart"
]
the perfect regex for this is : 完美的正则表达式是:
/((?!:[a-z]+)([ ]+|(?<= |^)[a-z]*(?= |$)))/g
But it has a positive lookbehind that, as I read, was only implemented in javascript in 2018, so I guess many browser compatibility conflicts... and I would like it to have at least a little compatibility ... 但是,正如我所读到的那样,它具有积极的后盾 ,正如我在2018年仅在javascript中实现的那样,所以我猜想许多浏览器兼容性冲突......我希望它至少具有一点兼容性 ...
I considered : 我考虑过 :
Understand, I NEED words AND ALL spaces, and to exclude some words. 明白了,我需要单词和所有空格,并排除一些单词。 I am open in other methods, like not using regex.
我对其他方法持开放态度,例如不使用正则表达式。
removing the spaces-check and organising my whole regex in the right order , praying that ":word" would be kept in the "special words" group before anything else. 删除空格检查并按正确的顺序组织我的整个正则表达式,祈祷“:word”在其他任何内容之前都保留在“特殊单词”组中。
would that work in javascript, and be reliable ? 可以在javascript中工作,并且可靠吗?
I tried 我试过了
/(((:[a-z]+)|([ ]+)|([a-z]*))/g
in https://regexr.com/ seems to work, will it work in every case ? 在https://regexr.com/中似乎可以正常工作,在每种情况下都可以工作吗?
You said you're open to non-regex solutions, but I can give you one that includes both. 您说过您可以使用非正则表达式解决方案,但我可以给您一个包括这两种解决方案的解决方案。 Since you can't rely on lookbehind being supported, then just capture everything and filter out what you don't want, words followed by a colon.
由于您不能依靠幕后支持,因此只需捕获所有内容并过滤掉不需要的内容即可,单词后跟冒号。
const text = 'hello this :word is apart'; const regex = /(\\w+)|(:\\w+)|(\\s+)/g; const parsed = text.match(regex).filter(word => !word.includes(':')); console.log(parsed);
I would use 2 regexes, first one matches the Words, you DON'T want and then replace
them with an empty string
, this is the simple regex: 我将使用2个正则表达式,第一个与单词匹配,您不想要,然后
replace
它们replace
为empty string
,这是简单的正则表达式:
/:\w+/g
Then replace
with an empty string
. 然后
replace
为empty string
。 Now you have a string, that can be parsed with this regex: 现在您有了一个字符串,可以使用此正则表达式进行解析:
/([ ]+)|([a-z]*)/g
which is a simplified version of your second regex, since forbidden Words are already gone. 这是您第二个正则表达式的简化版本,因为禁止的单词已经消失了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.