简体   繁体   中英

regular expression lookaround

I don't think this is possible with just regular expressions, but I'm not an expert so i thought it was worth asking.

I'm trying to do a massive search and replace of C# code, using .NET regex. What I want to do is find a line of code where a specific function is called on a variable that is of type DateTime. eg:

axRecord.set_Field("CreatedDate", m_createdDate);

and I would know that it's a DateTime variable b/c earlier in that code file would be the line:

DateTime m_createdDate;

but it seems that I can't use a named group in negative lookbehind like:

(?<=DateTime \k<1>.+?)axRecord.set_[^ ]+ (?<1>[^ )]+)

and if I try to match the all the text between the variable declaration and the function call like this:

DateTime (?<1>[^;]+).+?axRecord.set.+?\k<1>

it will find the first match - first based on first variable declared - but then it can't find any other matches, because the code is laid out like this:

DateTime m_First;
DateTime m_Second;
...
axRecord.set_Field("something", m_First);
axRecord.set_Field("somethingElse", m_Second);

and the first match encompasses the second variable declaration.

Is there a good way to do this with just regular expressions, or do I have to resort to scripting in my logic?

Have a look at my answer to this question Get a methods contents from a C# file

It gives links to pages that show how to use the built in .net language parser to do this simply and reliably (ie not by asking "what looks like the usage I'm searching for", but by properly parsing the code with VS code parsing tools).

I know it's not a RegEx answer, but I don't think RegEx is the answer.

This will be difficult to do with a single regex expression. However it is possible to do if you consider a processing the lines with a bit of state.

Note: I can't tell exactly what you're trying to match on the axRecord line so you'll likely need to adjust that regex appropriately.

void Process(List<string> lines) {
  var comp = StringComparer.Ordinal;
  var map = new Hashset<string>comp);
  var declRegex = new Regex("^\s(?<type>\w+)\s*(?<name>m_\w+)\s*";);
  var toReplaceRegex = new Regex("^\s*axRecord.set_(?<toReplace>.*(?<name>m_\w+).*)");

  for( var i = 0; i < lines.Length; i++) {
    var line = lines[i];
    var match = declRegex.Match(line);
    if ( match.Success ) {
      if ( comp.Equals(match.Groups["type"], "DateTime") ) {
        map.Add(comp.Groups["name"]);
      } else {
        map.Remove(comp.Groups["name"]);
      }
      continue;
    }

    match = toReplaceRegex.Match(line);
    if ( match.Success && map.Contains(match.Groups["name"]) ) {
      // Add your replace logic here
    }
}

This cannot be done using regular expressions. For one thing, C#'s grammar is not regular; but more importantly, you're talking about analyzing expressions that are lexically unrelated. For this sort of thing, you're going to need full semantic analysis. That means lexer, parser, name binding and finally type checker. Once you have the annotated AST, you can look for the field you want and just read off the type.

I'm guessing this is a lot more work than you want to do though, seeing as it's about half of a full-blown C# compiler.

This is weird. I managed to build a regex that does find it, but it only matches the first one.

(?<=private datetime (?<1>\b\w+\b).+?)set_field[^;]+?\k<1>

so it seems like if I can't use a named group in a lookbehind, I can at least establish a named group in the lookbehind, and the use it in the match. But then it looks like when it matches just the function call (which is what I wanted) the caret position is moved to that line, and so it can't find any new matches because it's passed their declarations. or maybe I don't understand how the engine is really working.

i guess what I'm looking for is a regex option that tells it to look inside matches for more matches. which come to think of it, seems like that would be needed for basic html regex parsing too. you find a tag and then it's closing tag and the whole page is enclosed in that match, so you won't find any other other tags unless you recursively apply the pattern to each match.

anyone know anything about this or am i talking crazy?

Try this:

@"(?s)set_Field\(""[^""]*"",\s*(?<vname>\w+)(?<=\bDateTime\s+\k<vname>\b.+)"

By doing the lookbehind first, you're forcing the regex to search for the method calls in a particular order: the order in which the variables are declared. What you want to do is match a likely-looking method call first, then use the lookbehind to verify the type of the variable.

I just made a rough guess at the part that matches the method call. Like the others have said, whatever regex you use is going to have to be tailored to your code; there's no general solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM