简体   繁体   English

正则表达式环顾

[英]regular expression lookaround

I don't think this is possible with just regular expressions, but I'm not an expert so i thought it was worth asking. 我认为仅使用正则表达式是不可能的,但是我不是专家,所以我认为值得一问。

I'm trying to do a massive search and replace of C# code, using .NET regex. 我正在尝试使用.NET正则表达式进行大量搜索并替换C#代码。 What I want to do is find a line of code where a specific function is called on a variable that is of type DateTime. 我想做的是找到一行代码,其中在DateTime类型的变量上调用特定函数。 eg: 例如:

axRecord.set_Field("CreatedDate", m_createdDate);

and I would know that it's a DateTime variable b/c earlier in that code file would be the line: 而且我知道在代码文件的前面是DateTime变量b / c是这样的:

DateTime m_createdDate;

but it seems that I can't use a named group in negative lookbehind like: 但似乎我不能像下面这样在否定性后面使用命名组:

(?<=DateTime \k<1>.+?)axRecord.set_[^ ]+ (?<1>[^ )]+)

and if I try to match the all the text between the variable declaration and the function call like this: 如果我尝试匹配变量声明和函数调用之间的所有文本,如下所示:

DateTime (?<1>[^;]+).+?axRecord.set.+?\k<1>

it will find the first match - first based on first variable declared - but then it can't find any other matches, because the code is laid out like this: 它将找到第一个匹配项-首先基于声明的第一个变量-但随后找不到任何其他匹配项,因为代码的布局如下:

DateTime m_First;
DateTime m_Second;
...
axRecord.set_Field("something", m_First);
axRecord.set_Field("somethingElse", m_Second);

and the first match encompasses the second variable declaration. 第一个匹配项包含第二个变量声明。

Is there a good way to do this with just regular expressions, or do I have to resort to scripting in my logic? 是否有一种仅使用正则表达式执行此操作的好方法,还是我必须在逻辑上求助于脚本?

Have a look at my answer to this question Get a methods contents from a C# file 看看我对这个问题的回答从C#文件中获取方法内容

It gives links to pages that show how to use the built in .net language parser to do this simply and reliably (ie not by asking "what looks like the usage I'm searching for", but by properly parsing the code with VS code parsing tools). 它提供了指向页面的链接,这些页面显示了如何使用内置的.net语言解析器来简单,可靠地执行此操作(即,不是通过询问“我正在寻找的用法”,而是通过使用VS代码正确地解析了代码)解析工具)。

I know it's not a RegEx answer, but I don't think RegEx is the answer. 我知道这不是RegEx的答案,但我不认为RegEx是答案。

This will be difficult to do with a single regex expression. 使用单个正则表达式很难做到这一点。 However it is possible to do if you consider a processing the lines with a bit of state. 但是,如果您考虑对状态稍有处理的行,则可以这样做。

Note: I can't tell exactly what you're trying to match on the axRecord line so you'll likely need to adjust that regex appropriately. 注意:我无法在axRecord行上确切告诉您要匹配的内容,因此您可能需要适当地调整该正则表达式。

void Process(List<string> lines) {
  var comp = StringComparer.Ordinal;
  var map = new Hashset<string>comp);
  var declRegex = new Regex("^\s(?<type>\w+)\s*(?<name>m_\w+)\s*";);
  var toReplaceRegex = new Regex("^\s*axRecord.set_(?<toReplace>.*(?<name>m_\w+).*)");

  for( var i = 0; i < lines.Length; i++) {
    var line = lines[i];
    var match = declRegex.Match(line);
    if ( match.Success ) {
      if ( comp.Equals(match.Groups["type"], "DateTime") ) {
        map.Add(comp.Groups["name"]);
      } else {
        map.Remove(comp.Groups["name"]);
      }
      continue;
    }

    match = toReplaceRegex.Match(line);
    if ( match.Success && map.Contains(match.Groups["name"]) ) {
      // Add your replace logic here
    }
}

This cannot be done using regular expressions. 这不能使用正则表达式来完成。 For one thing, C#'s grammar is not regular; 一方面,C#的语法不规则。 but more importantly, you're talking about analyzing expressions that are lexically unrelated. 但更重要的是,您正在谈论分析在词汇上不相关的表达式。 For this sort of thing, you're going to need full semantic analysis. 对于这种事情,您将需要完整的语义分析。 That means lexer, parser, name binding and finally type checker. 这意味着词法分析器,解析器,名称绑定以及最后的类型检查器。 Once you have the annotated AST, you can look for the field you want and just read off the type. 获得带注释的AST之后,您可以查找所需的字段,然后直接读取类型。

I'm guessing this is a lot more work than you want to do though, seeing as it's about half of a full-blown C# compiler. 我猜这比您想做的工作要多得多,因为它大约是成熟的C#编译器的一半。

This is weird. 真奇怪 I managed to build a regex that does find it, but it only matches the first one. 我设法建立了一个确实找到它的正则表达式,但它仅与第一个匹配。

(?<=private datetime (?<1>\b\w+\b).+?)set_field[^;]+?\k<1>

so it seems like if I can't use a named group in a lookbehind, I can at least establish a named group in the lookbehind, and the use it in the match. 因此,如果我不能在后备条件中使用命名组,则至少可以在后备条件中建立命名组,并在比赛中使用它。 But then it looks like when it matches just the function call (which is what I wanted) the caret position is moved to that line, and so it can't find any new matches because it's passed their declarations. 但是然后看起来像当它仅与函数调用(这就是我想要的)匹配时,插入号位置已移动到该行,因此它找不到任何新匹配项,因为它已传递了它们的声明。 or maybe I don't understand how the engine is really working. 也许我不明白引擎是如何工作的。

i guess what I'm looking for is a regex option that tells it to look inside matches for more matches. 我猜我正在寻找的是一个正则表达式选项,它告诉它在匹配项中查找更多匹配项。 which come to think of it, seems like that would be needed for basic html regex parsing too. 想到这一点,似乎也需要基本的html regex解析。 you find a tag and then it's closing tag and the whole page is enclosed in that match, so you won't find any other other tags unless you recursively apply the pattern to each match. 您找到一个标签,然后将其作为结束标签,并且整个页面都包含在该匹配项中,因此除非您递归地将模式应用于每个匹配项,否则您将找不到其他任何标签。

anyone know anything about this or am i talking crazy? 任何人对此一无所知,或者我在疯吗?

Try this: 尝试这个:

@"(?s)set_Field\(""[^""]*"",\s*(?<vname>\w+)(?<=\bDateTime\s+\k<vname>\b.+)"

By doing the lookbehind first, you're forcing the regex to search for the method calls in a particular order: the order in which the variables are declared. 通过先进行后向查找,您将强制正则表达式按特定顺序搜索方法调用:变量声明的顺序。 What you want to do is match a likely-looking method call first, then use the lookbehind to verify the type of the variable. 您想要做的是先匹配一个看起来很可能的方法调用,然后使用后向验证变量的类型。

I just made a rough guess at the part that matches the method call. 我只是对与方法调用匹配的部分进行了粗略的猜测。 Like the others have said, whatever regex you use is going to have to be tailored to your code; 就像其他人所说的那样,无论您使用什么正则表达式都必须根据您的代码量身定制; there's no general solution. 没有通用的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM