简体   繁体   中英

Using Regex to replace part of the entire string/expression

Regex are simple yet complex at times. Stuck to replace an expression having variables, assuming variable is of the following pattern:

\w+(\.\w+)*

I want to replace all the occurrences of my variable replacing dot (.) because i have to eventually tokenize the expression where tokenizer do not recognize variable having dots. So i thought to replace them with underscore before parsing. After tokenizing however i want to get the variable token with original value.

Expression:

(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3

Three Variables:

x1.y2.z3

y2_z1

x1.y2.z3

Desired Output:

(x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3
  • Question 1: How to use Regex replace in this case?

  • Question 2: Is there any better way to address above mentioned problem because variable can have underscore so replacing dot with underscore is not a viable solution to get the original variable back in tokens?

This regex pattern seems to work: [a-zA-Z]+\\d+\\S+

To replace a dot found only in a match you use MatchEvaluator:

    private static char charToReplaceWith = '_';
    static void Main(string[] args)
    {
        string s = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
        Console.WriteLine(Regex.Replace(s, @"[a-zA-Z]+\d+\S+", new MatchEvaluator(ReplaceDotWithCharInMatch)));
        Console.Read();
    }

    private static string ReplaceDotWithCharInMatch(Match m)
    {
        return m.Value.Replace('.', charToReplaceWith);
    }

Which gives this output: (x1_y2_z3 + 9.99) + y2_z1 - x1_y2_z3

I don't fully understand your second question and how to deal with tokenizing variables that already have underscores, but you should be able to choose a character to replace with (ie, if (string.Contains('_')) is true then you choose a different character to replace with, but probably have to maintain a dictionary that says "I replaced all dots with underscores, and all underscores with ^ , etc..).

Try this:

        string input = "(x1.y2.z3 + 9.99) + y2_z1 - x1.y2.z3";
        string output = Regex.Replace(input, "\\.(?<![a-z])", "_");

This will replace only periods which are followed by a letter (az).

Use Regex' negative lookahead by making a group that starts with (?!

A dot followed by something non-numeric would be as simple as this:

// matches any dot NOT followed by a character in the range 0-9
String output = Regex.Replace(input, "\\.(?![0-9])", "_");

This has the advantage that while the [0-9] is part of the expression, it is only checked as being behind the match, but is not actually part of the match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM