简体   繁体   English

用正则表达式替换正则表达式

[英]Replace regular expression with regular expression

Consider two regular expressions: 考虑两个正则表达式:

var regex_A = "Main\.(.+)\.Value";
var regex_B = "M_(.+)_Sp";

I want to be able to replace a string using regex_A as input, and regex_B as the replacement string. 我希望能够使用regex_A作为输入替换字符串,并使用regex_B作为替换字符串。 But also the other way around. 但反之亦然。 And without supplying additional information like a format string per regex. 并且提供其他信息,例如每个正则表达式的格式字符串。

Specifically I want to create a replaced_B string from an input_A string. 具体来说,我想从input_A字符串创建一个replaced_B字符串。 So: 所以:

var input_A = "Main.Rotating.Value";
var replaced_B = input_A.RegEx_Awesome_Replace(regex_A, regex_B);
Assert.AreEqual("M_Rotating_Sp", replaced_B);

And this should also work in reverse (thats the reason i can't use a simple string.format for regex_B). 而且这也应该反向工作(这就是我不能为regex_B使用简单的string.format的原因)。 Because I don't want to supply a format string for every regular expression (i'm lazy). 因为我不想为每个正则表达式提供格式字符串(我很懒)。

var input_B = "M_Skew_Sp";
var replaced_A = input_B.RegEx_Awesome_Replace(regex_B, regex_A);
Assert.AreEqual("Main.Skew.Value", replaced_A);

I have no clue if this exists, or how to call it. 我不知道这是否存在或如何称呼。 Google search finds me all kinds of other regex replaces... not this one. Google搜索发现我替换了其他各种正则表达式...不是这个。

Update: 更新:

So basically I need a way to convert a regular expression to a format string. 因此,基本上,我需要一种将正则表达式转换为格式字符串的方法。

var regex_A_format = Regex2Format(regex_A);
Assert.AreEqual("Main.$1.Value", regex_A_format);

and

var regex_B_format = Regex2Format(regex_B);
Assert.AreEqual("M_$1_Sp", regex_B_format);

So what should the RegEx_Awesome_Replace and/or Regex2Format function look like? 那么RegEx_Awesome_Replace和/或Regex2Format函数应该是什么样?

Update 2: 更新2:

I guess the RegEx_Awesome_Replace should look something like (using some code from answers below): 我猜想RegEx_Awesome_Replace应该看起来像(使用下面答案中的一些代码):

public static class StringExtenstions
{
    public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
    {
        return Regex.Replace(inputString, searchPattern, Regex2Format(replacePattern));
    }
}

Which would leave the Regex2Format as an open question. 这将使Regex2Format成为一个悬而未决的问题。

There is no defined way for one regex to refer to a match found in another regex. 一个正则表达式没有定义的方式引用另一个正则表达式中找到的匹配项。 Regexes are not format strings. 正则表达式不是格式字符串。

What you can do is to use Tuple s of a format string together with its regex. 您可以做的是将格式字符串的Tuple与它的regex一起使用。 eg 例如

var a = new Tuple<Regex,string>(new Regex(@"(?<=Main\.).+(?=\.Value)"), @"Main.{0}.Value")
var b = new Tuple<Regex,string>(new Regex(@"(?<=M_).+(?=_Sp)"), @"M_{0}_Sp")`

Then you can pass these objects to a common replacement method in any order, like this: 然后,您可以按任何顺序将这些对象传递给通用的替换方法,如下所示:

private string RegEx_Awesome_Replace(string input, Tuple<Regex,string> toFind, Tuple<Regex,string> replaceWith)
{
    return string.Format(replaceWith.Item2, toFind.Item1.Match(input).Value);
}

You will notice that I have used zero-width positive lookahead assertion and zero-width positive lookbehind assertions in my regexes, to ensure that Value contains exactly the text that I want to replace. 您会注意到,我在正则表达式中使用了零宽度正向超前断言和零宽度正向超前断言 ,以确保Value恰好包含我要替换的文本。

You may also want to add error handling, for cases where the match can not be found. 对于找不到匹配项的情况,您可能还想添加错误处理。 Maybe read about Regex.Match 也许读到Regex.Match

Since you have already reduced your problem to where you need to change a Regex into a string format (implementing Regex2Format ) I will focus my answer just on that part. 由于您已经将问题减少到需要将Regex更改为字符串格式的地方(实现Regex2Format ),因此我的回答将仅集中在那部分。 Note that my answer is incomplete because it doesn't address the full breadth of parsing regex capturing groups, however it works for simple cases. 请注意,我的答案是不完整的,因为它没有解决解析正则表达式捕获组的全部问题,但是它适用于简单的情况。

First thing needed is a Regex that will match Regex capture groups. 首先需要一个与Regex捕获组匹配的Regex。 There is a negative lookbehind to not match escaped bracket symbols. 后面有一个负数,表示不匹配转义的括号符号。 There are other cases that break this regex. 还有其他情况破坏了此正则表达式。 Eg a non-capturing group, wildcard symbols, things between square braces. 例如,非捕获组,通配符,方括号之间的内容。

private static readonly Regex CaptureGroupMatcher = new Regex(@"(?<!\\)\([^\)]+\)");

The implementation of Regex2Format here basically writes everything outside of capture groups into the output string, and replaces the capture group value by {x} . 这里的Regex2Format实现基本上将捕获组之外的所有内容写入输出字符串,并用{x}替换捕获组值。

static string Regex2Format(string pattern)
{
    var targetBuilder = new StringBuilder();
    int previousEndIndex = 0;
    int formatIndex = 0;
    foreach (Match match in CaptureGroupMatcher.Matches(pattern))
    {
        var group = match.Groups[0];
        int endIndex = group.Index;
        AppendPart(pattern, previousEndIndex, endIndex, targetBuilder);
        targetBuilder.Append('{');
        targetBuilder.Append(formatIndex++);
        targetBuilder.Append('}');
        previousEndIndex = group.Index + group.Length;
    }
    AppendPart(pattern, previousEndIndex, pattern.Length, targetBuilder);
    return targetBuilder.ToString();
}

This helper function writes pattern string values into the output, it currently writes everything except \\ characters used to escape something. 这个辅助函数将模式字符串值写入输出中,它当前会将除\\字符以外的所有内容写入用于转义的内容。

static void AppendPart(string pattern, int previousEndIndex, int endIndex, StringBuilder targetBuilder)
{
    for (int i = previousEndIndex; i < endIndex; i++)
    {
        char c = pattern[i];
        if (c == '\\' && i < pattern.Length - 1 && pattern[i + 1] != '\\')
        {
            //backslash not followed by another backslash - it's an escape char
        }
        else
        {
            targetBuilder.Append(c);
        }
    }
}

Test cases 测试用例

static void Test()
{
    var cases = new Dictionary<string, string>
    {
        { @"Main\.(.+)\.Value", @"Main.{0}.Value" },
        { @"M_(.+)_Sp(.*)", "M_{0}_Sp{1}" },
        { @"M_\(.+)_Sp", @"M_(.+)_Sp" },
    };

    foreach (var kvp in cases)
    {
        if (PatternToStringFormat(kvp.Key) != kvp.Value)
        {
            Console.WriteLine("Test failed for {0} - expected {1} but got {2}", kvp.Key, kvp.Value, PatternToStringFormat(kvp.Key));
        }
    }

}

To wrap up, here is the usage: 总结一下,这是用法:

private static string AwesomeRegexReplace(string input, string sourcePattern, string targetPattern)
{
    var targetFormat = PatternToStringFormat(targetPattern);
    return Regex.Replace(input, sourcePattern, match =>
    {
        var args = match.Groups.OfType<Group>().Skip(1).Select(g => g.Value).ToArray<object>();
        return string.Format(targetFormat, args);
    });
}

这样的事情可能会起作用

 var replaced_B = Regex.Replace(input_A, @"Main\.(.+)\.Value", @"M_$1_Sp");

Are you looking for something like this? 您是否正在寻找这样的东西?

public static class StringExtenstions
{
    public static string RegExAwesomeReplace(this string inputString,string searchPattern,string replacePattern)
    {
        Match searchMatch = Regex.Match(inputString,searchPattern);
        Match replaceMatch = Regex.Match(inputString, replacePattern);

        if (!searchMatch.Success || !replaceMatch.Success)
        {
            return inputString;
        }

        return inputString.Replace(searchMatch.Value, replaceMatch.Value);
    }
}

The string extension method returns the string with replaced value for search pattern and replace pattern. 字符串扩展方法返回带有替换值的字符串,用于搜索模式和替换模式。

This is how you call: 这是您的呼叫方式:

input_A.RegEx_Awesome_Replace(regex_A, regex_B);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM