简体   繁体   English

TR1正则表达式:捕获组?

[英]TR1 regex: capture groups?

I am using TR1 Regular Expressions (for VS2010) and what I'm trying to do is search for specific pattern for a group called "name", and another pattern for a group called "value". 我正在使用TR1正则表达式 (对于VS2010),我正在尝试做的是为名为“name”的组搜索特定模式,为名为“value”的组搜索另一种模式。 I think what I want is called a capture group , but I'm not sure if that's the right terminology. 我认为我想要的是一个捕获组 ,但我不确定这是否是正确的术语。 I want to assign matches to the pattern "[^:\\r\\n]+):\\s" to a list of matches called "name", and matches of the pattern "[^\\r\\n]+)\\r\\n)+" to a list of matches called "value". 我想将匹配分配给模式“[^:\\ r \\ n] +):\\ s”到名为“name”的匹配列表,并匹配模式“[^ \\ r \\ n] +)\\ r \\ n” \\ n)+“到名为”value“的匹配列表。

The regex pattern I have so far is 到目前为止我的正则表达式是

string pattern = "((?<name>[^:\r\n]+):\s(?<value>[^\r\n]+)\r\n)+";

But the regex T4R1 header keeps throwing an exception when the program runs. 但是程序运行时正则表达式T4R1标头不断抛出异常。 What's wrong with the syntax of the pattern I have? 我的模式的语法有什么问题? Can someone show an example pattern that would do what I'm trying to accomplish? 有人可以展示一个能够做我想要完成的事情的示例模式吗?

Also, how would it be possible to include a substring within the pattern to match, but not actually include that substring in the results? 另外,如何在模式中包含一个匹配的子字符串,但实际上不包含结果中的子字符串? For example, I want to match all strings of the pattern 例如,我想匹配模式的所有字符串

"http://[[:alpha:]]\r\n"

, but I don't want to include the substring "http://" in the returned results of matches. ,但我不想在返回的匹配结果中包含子字符串“http://”。

The C++ TR1 and C++11 regular expression grammars don't support named capture groups. C ++ TR1和C ++ 11正则表达式语法不支持命名捕获组。 You'll have to do unnamed capture groups. 您必须执行未命名的捕获组。

Also, make sure you don't run into escaping issues. 此外,请确保您不会遇到转义问题。 You'll have to escape some characters twice: one for being in a C++ string, and another for being in a regex. 你必须两次转义一些字符:一个用于C ++字符串,另一个用于正则表达式。 The pattern (([^:\\r\\n]+):\\s\\s([^\\r\\n]+)\\r\\n)+ can be written as a C++ string literal like this: 模式(([^:\\r\\n]+):\\s\\s([^\\r\\n]+)\\r\\n)+可以写成C ++字符串文字,如下所示:

"([^:\\r\\n]+:\\s\\s([^\\r\\n]+)\\r\\n)+"
// or in C++11
R"xxx(([^:\r\n]+:\s\s([^\r\n]+)\r\n)+)xxx"

Lookbehinds are not supported either. 也不支持Lookbehinds。 You'll have to work around this limitation by using capture groups: use the pattern (http://)([[:alpha:]]\\r\\n) and grab only the second capture group. 您必须使用捕获组来解决此限制:使用模式(http://)([[:alpha:]]\\r\\n)并仅抓取第二个捕获组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM