简体   繁体   English

c++ 中的正则表达式,用于处理一些模式

[英]Regex in c++ for maching some patters

I want regex of this.我想要这个的正则表达式。

  • add x2, x1, x0 is a valid instruction; add x2, x1, x0 是有效指令;

I want to implement this.我想实现这个。 But bit confused, how to, as I am newbie in using Regex.但是有点困惑,因为我是使用 Regex 的新手。 Can anyone share these Regex?任何人都可以分享这些正则表达式吗?

If this is a longer project and will have more requirements later, then definitely a different approach would be better.如果这是一个较长的项目并且以后会有更多的要求,那么采用不同的方法肯定会更好。

The standard approach to solve such a problem ist to define a grammar and then created a lexer and a parser.解决此类问题的标准方法是定义语法,然后创建词法分析器和解析器。 The tools lex/yacc or flex/bison can be used for that. lex/yacc 或 flex/bison 工具可以用于此。 Or, a simple shift/reduce parser can also be hand crafted.或者,也可以手工制作一个简单的移位/归约解析器。

The language that you sketched with the given grammar, may be indeed specified with a Chomsky class 3 grammar, and can hence be produced gy a regular grammar.你用给定语法描绘的语言,可能确实用 Chomsky class 3 语法指定,因此可以用常规语法生成。 And, with that, parsed with regular expressions.然后,用正则表达式进行解析。

The specification is a little bit unclear as to what a register is and if there are more keyowrds.关于寄存器是什么以及是否有更多关键字,规范有点不清楚。 Especially ecall is unclear.尤其是ecall不清楚。

But how to build such a regex?但是如何构建这样的正则表达式呢?

You will define small tokens and concatenate them.您将定义小标记并将它们连接起来。 And different paths can be implemented with the or operator |并且不同的路径可以用 or 运算符|来实现. .

Let's give sume example.让我们举个例子。

  • a register may be matched with a\d+ .寄存器可以与a\d+匹配。 So, an "a" followed by ome digits.因此,“a”后跟一些数字。 If it is not only "a", but other letters as well, you could use [az]\d+如果它不仅是“a”,还有其他字母,你可以使用[az]\d+
  • op codes with the same number of parameters can be listed up with a simple or |具有相同数量参数的操作码可以用简单的或|列出。 . . like in add|sub就像在add|sub
  • For spaces there are many solutions.对于空间,有很多解决方案。 you may use \s+ or [ ]+ or whatever spaces you need.您可以使用\s+[ ]+或您需要的任何空格。
  • To build one rule, you can concatenate what you learned so far要构建一个规则,您可以将到目前为止所学的内容串联起来
  • Having different parts needs an or |有不同的部分需要一个或| for the complete path完整路径
  • If you want to get back the matched groups, you must enclose the needed stuff in brackets如果你想找回匹配的组,你必须把需要的东西括在括号里

And with that, one of many many possible solutions can be:因此,许多可能的解决方案之一可以是:

^[ ]*((add|sub)[ ]+(a\d+)[ ]*,[ ]*(a\d+)[ ]*,[ ]*(a\d+)|(ecall))[ ]*$

See example in: regex101请参阅以下示例: regex101

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM