简体   繁体   English

如何使用ECMAScript正则表达式整理序列\\“?

[英]How to collate sequence \" using ECMAScript regular expressions?

I'm trying to construct a regular expression to treat delimited speech marks ( \\" ) as a single character. 我正在尝试构造一个正则表达式,以将分隔的语音标记( \\" )视为单个字符。

The following code compiles fine, but terminates on trying to initialise rgx, throwing the error Abort trap: 6 using libc++. 以下代码可以正常编译,但是在尝试初始化rgx时终止,并抛出错误Abort trap: 6使用libc ++。

std::regex rgx("[[.\\\\\".]]");
std::smatch results;
std::string test_str("\\\"");
std::regex_search(test_str, results, rgx);

If I remove the [[. .]] 如果我删除[[. .]] [[. .]] , it runs fine, results[0] returning \\" as intended, but as said, I'd like for this sequence to be usable as a character class. [[. .]] ,它运行正常, results[0]按预期返回\\" ,但如上所述,我希望此序列可用作字符类。

Edit: Ok, I realise now that my previous understanding of collated sequences was incorrect, and the reason it wouldn't work is that \\\\\\\\\\" is not defined as a sequence. So my new question: is it possible to define collated sequences? 编辑:好的,我现在意识到我以前对整理的序列的理解是不正确的,并且它不起作用的原因是未将\\\\\\\\\\"定义为序列。所以我的新问题是:是否可以定义整理的序列?

So I figured out where I was going wrong and thought I'd leave this here in case anyone stumbles across it. 因此,我想出了我要去哪里的错误,并认为如果有人偶然发现它,我会把它留在这里。

You can specify a passive group of characters with (?:sequence) , allowing quantifiers to be applied as with a character class. 您可以使用(?:sequence)指定一个被动字符组,从而允许像字符类一样应用量词。 Perhaps not exactly what I'd originally asked, but fulfils the same purpose, in my case at least. 也许与我最初提出的要求不完全相同,但至少在我看来,它实现了相同的目的。

To match a string beginning and ending with double quotation marks (including these characters in the results), but allowing delimited quotation marks within the the string, I used the expression 为了匹配以双引号(包括结果中的这些字符)开头和结尾的字符串,但允许在字符串中使用带引号的引号,我使用了表达式

\"(?:[^\"^\\\\]+|(?:\\\\\\\\)+|\\\\\")*\"

which says to grab the as many characters as possible, provided characters are not quotation marks or backslashes, then if this does not match, to firstly attempt to match an even number of backslashes (to allow delimiting of this character), or secondly a delimited quotation mark. 它说要获取尽可能多的字符,前提是字符不是引号或反斜杠,如果不匹配,则首先尝试匹配偶数个反斜杠(以允许对此字符进行定界),或第二个定界符引号。 This non-capturing group is matched as many times as possible, stopping only when it reaches a \\" . 此非捕获组会尽可能匹配,仅在达到\\"时才停止。

I couldn't comment on the efficiency of this, but it definitely works. 我无法对此效率发表评论,但是绝对可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM