简体   繁体   English

C++ 正则表达式捕获组混淆

[英]C++ regex capture group confusion

I'm implementing the nand2tetris Assembler in C++ (I'm pretty new to C++), and I'm having a lot of trouble parsing a C-instruction using regex.我正在 C++ 中实现 nand2tetris 汇编程序(我对 C++ 很陌生),并且在使用正则表达式解析 C 指令时遇到了很多麻烦。 Mainly I really don't understand the return value of regex_search and how to use it.主要是我真的不明白regex_search的返回值以及如何使用。

Setting aside the various permutations of a C instruction, the current example I'm having trouble with is D=DM .撇开 C 指令的各种排列不谈,我遇到的当前示例是D=DM The result should have dest = "D"; comp = "DM"结果应该有dest = "D"; comp = "DM" dest = "D"; comp = "DM" . dest = "D"; comp = "DM"

With the current code below, the regex appears to find the results correctly (confirmed by regex101.com), but, not really correctly, or something, or I don't know how to get to it.使用下面的当前代码,正则表达式似乎可以正确找到结果(由 regex101.com 确认),但不是真的正确,或者其他什么,或者我不知道如何找到它。 See the debugger screenshot.请参阅调试器屏幕截图。 matches[n].second (which appears to contain the correct comp value) is not a string but an iterator. matches[n].second (似乎包含正确的comp值)不是字符串,而是迭代器。

Note that the 3rd capture group is correctly empty for this example.请注意,对于此示例,第 3 个捕获组正确为空。

auto regex_str = regex("([AMD]{1,3}=)?([01\-AMD!|+&><]{1,3})?(;[A-Z]{3})?");
regex_search(assemblyCode, matches, regex_str);
string dest = matches[1]; // this automatically casts some object (submatch) into a string?
string comp = matches[2]; 
string jump = matches[3];

在此处输入图像描述

I will note, though, that D=D+M works, but not D=DM !不过,我会注意到D=D+M有效,但D=DM无效!

gcc warns about unknows escape sequence \- Demo . gcc 警告未知转义序列\-演示

You have to escape \ ,你必须逃避\

std::regex("([AMD]{1,3}=)?([01\\-AMD!|+&><]{1,3})?(;[A-Z]{3})?");

or use raw string或使用原始字符串

std::regex(R"(([AMD]{1,3}=)?([01\-AMD!|+&><]{1,3})?(;[A-Z]{3})?)");

Demo演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM