简体   繁体   English

符合C结构的正则表达式

[英]regular expression to match C structure

I would like a regular expression to match a C Structure define. 我想要一个正则表达式来匹配C结构定义。 This is my target data: 这是我的目标数据:

typedef struct
{
}dontMatchThis;

typedef struct
{
  union //lets have a union as well
  {
    struct 
    {
     int a
     //a comment for fun

     int b;
     int c;
    };
    char byte[10];
  };
}structA;

I want to match the define of structA only, from typedef to strunctA. 我只想匹配structA的定义,从typedef到strunctA。

I have tried : typedef[\\s\\S]+?structA 我试过了: typedef[\\s\\S]+?structA

But event though I'm using the non-greedy modifier this is matching both structures. 但是,尽管我使用的是非贪婪修饰符,但这确实匹配了两种结构。 Any suggestions 有什么建议么

In the general case, it is simply not possible. 在一般情况下,这根本不可能。 The typedef or the struct could have been generated by preprocessor macro invocations (and you could have typedef in one file, and struct in another #include -d file, or struct coming from one preprocessor macro, and typedef from another one.). typedefstruct可能是由预处理宏调用生成的(你可以有typedef在一个文件中,并且struct在其它#include -d文件,或struct从一个预处理宏,以及未来typedef从另外一个)。

I would suggest instead to extend or customize the GCC compiler, either thru a plugin or a MELT extension (MELT is a domain specific language to extend GCC). 我建议改为通过插件或MELT扩展(MELT是扩展GCC的域特定语言)来扩展或自定义GCC编译器。

See also etags 另请参阅etags

The problem is the point where the regexp begins matching. 问题在于正则表达式开始匹配。 It correctly starts matching at the first typedef and continues until structA. 它正确地从第一个typedef开始匹配,并一直持续到structA。

It's really difficult (I would say impossible to do correctly) what you're trying to do. 您正在尝试做的事情真的很难(我会说不可能正确地做)。 You would need to match nested braces to see where the struct stops. 您需要匹配嵌套的花括号以查看该结构在哪里停止。

See Building a Regex Based Parser . 请参阅构建基于Regex的解析器

I found the following works for me: 我找到了以下作品:

([\\s\\S]) (typedef([\\s\\S]) ?structA)

I then select the second group, which has my structure in. This uses the first [\\s\\S] as a greedy operator to match all the defines before the target struct. 然后,我选择第二个具有结构的组。这使用第一个[\\ s \\ S]作为贪婪的运算符,以匹配目标结构之前的所有定义。

As stated by ctn The problem with the non-greedy modifier as stated in your regex is that it starts looking for the first definition of typedef and will stop at the first place where it finds structA . 如ctn所述,正则表达式中所述的non-greedy修饰符的问题在于,它开始寻找typedef的第一个定义,并且将在找到structA的第一个位置处停止。 Everything in between is considered as valid. 介于两者之间的所有内容均视为有效。 A way to use regex to solve your problem is to define a regex which identifies the structs, and later in a separate stage you verify if the match corresponds to the struct that you want. 使用正则表达式解决问题的一种方法是定义一个可识别结构的正则表达式,然后在另一个阶段中,验证匹配项是否与所需结构相对应。

For example, using the regex: 例如,使用正则表达式:

(typedef[\s\S]+?})\s*([a-zA-Z0-9_]+)\s*;

you will define 2 groups, where the first starts at a typedef and ends at a curly brace, with a non-greedy text matching. 您将定义2个组,其中第一个组从typedef开始,并在花括号处结束,并且具有非贪婪的文本匹配。 This first group contains the string that you might want. 第一组包含您可能想要的字符串。 The final curly brace is followed by the struct name ([a-zA-Z0-9_]+) and ends with ; 最后的花括号后跟结构名称([a-zA-Z0-9_]+)并以;结尾; . Considering your example, there will be 2 matches, each containing 2 groups. 考虑您的示例,将有2个匹配项,每个匹配项包含2个组。

Match 1: 比赛1:

(typedef struct
{
})(dontMatchThis);

Value of group 2: dontMatchThis 组2的值:dontMatchThis

Match 2: 比赛2:

(typedef struct
{
  union //lets have a union as well
  {
    struct 
    {
     int a
     //a comment for fun

     int b;
     int c;
    };
    char byte[10];
  };
})(structA);

Value of group 2: structA 组2的值:structA

Thus, it becomes a matter of verifying if the value of the group 2 corresponds to structA. 因此,验证组2的值是否对应于structA成为一个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM