简体   繁体   English

JS正则表达式根据不带反斜杠的字符拆分字符串

[英]JS regexp to split string based on character not preceded by backslash

I want to use the JS String split function to split this string based only on the commas , , and not the commas preceded by backslashes /, . 我想用JS字符串split功能只在逗号分割基于此字符串,而不是通过反斜杠逗号/, How can I do this? 我怎样才能做到这一点?

'this,is\,a,\,string'.split(/,/)

This code splits it on all strings, I'm not sure how to get it to split just on the commas not preceded by backslashes. 这段代码将其拆分为所有字符串,我不确定如何将其拆分为仅以逗号开头,而不以反斜杠开头。

Since lookbehinds are not supported in JavaScript, it's hard to define "not preceded by something" pattern for split. 由于JavaScript不支持lookbehinds,因此很难定义用于拆分的“不带某些内容”模式。 However, you may define a "word" as a sequence of non-commas or escaped commas: 但是,您可以将“单词”定义为非逗号或转义逗号的序列:

(?:\\,|[^,])+

(demo: https://regex101.com/r/d5W21v/1) (演示: https : //regex101.com/r/d5W21v/1)

and extract all "word" matches: 并提取所有“单词”匹配项:

 var matches = "this,is\\\\,a,\\\\,string".match(/(?:\\\\,|[^,])+/g); console.log(matches); 

Replace the non-splitting symbol with a temporary symbol, split, and then restore the non-splitting symbol 将非拆分符号替换为临时符号,拆分,然后还原该非拆分符号

 'this,is\,a,\,string'.replace('\,','##NONBREAKING##').split(',')

Then loop over the resulting array, replacing '##NONBREAKING##' with '\\,'. 然后循环遍历结果数组,将“ ## NONBREAKING ##”替换为“ \\,”。

Obviously the temporary symbol '##NONBREAKING##' must be something that can never occur in the text you are breaking. 显然,临时符号“ ## NONBREAKING ##”必须是您要断开的文本中永远不会出现的符号。 Perhaps include some Unicode characters that are hard to type in? 也许包括一些很难键入的Unicode字符? Or include characters from multiple different languages (eg chinese, russian, indian, native american) that are unlikely to appear together in genuine text. 或包含不太可能在纯文字中同时出现的来自多种不同语言的字符(例如,中文,俄语,印度语,美洲印第安人)。

I think what you're looking for is called " Negative Lookbehind " - a regex element that looks back in the string and makes sure the pattern is not preceded by another pattern. 我认为您正在寻找的被称为“ Negative Lookbehind ”-一个正则表达式元素,该元素在字符串中向后看 ,并确保模式之前没有其他模式。

However, Javascript doesn't natively support Lookbehind. 但是,JavaScript本身不支持Lookbehind。 It does, however, (Negative and positive) Support Lookahead. 但是,它确实(负面和正面)支持前瞻。

So you could: 1. reverse the string 2. split by comma (unless it's followed by slash) 3. reverse the words back 4. reverse order of words 因此,您可以:1.反转字符串2.用逗号分隔(除非后面加斜杠)3.反转单词4.反转单词顺序

var temp = "this,is\\,a,\\,string"
var reversed = temp.split('').reverse().join('')
var words = t2.split(/,(?!\\)/).map(x => x.split('').reverse().join(''))
var finalResult = words.reverse()

It's kindof cumbersome though... 虽然有点麻烦...

You can create alternatively a custom method which retrieves an array. 您也可以创建一个自定义方法来检索数组。 If a comma found and its not preceded by a backslash, substring. 如果找到逗号并且其前面没有反斜杠,则为子字符串。 Obviously you need a counter to update the position next to the comma. 显然,您需要一个计数器来更新逗号旁边的位置。

Hope this can be helpful 希望这会有所帮助

This method is only currently supported in Chrome 62 (desktop and Android), Opera 49, and Node.js 8.10 Chrome 62(台式机和Android),Opera 49和Node.js 8.10当前仅支持此方法。

A limited set of JavaScript engines now support lookbehinds, so the following works in supported environments: 现在,有限的JavaScript引擎集支持后向查找,因此以下内容可在受支持的环境中工作:

 console.log('this,is\\\\,a,\\\\,string'.split(/(?<!\\\\),/)) 

Since this doesn't currently work in Firefox, Safari, or iOS Chrome (among others), it's not particularly useful for client-side development, but it is useful for Node apps. 由于此功能目前在Firefox,Safari或iOS Chrome(及其他)中不起作用,因此它对客户端开发不是特别有用,但对Node应用程序则非常有用。

Mozilla has an up-to-date browser compatibility section for regex lookbehinds. Mozilla拥有用于正则表达式的最新浏览器兼容性部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM