简体   繁体   中英

Regex replace all occurences with something that is “derived” from the part to be replaced

I have the following line from a RTF document

10 \u8314?\u8805? 0

(which says in clear text 10 ⁺≥ 0 ). You can see that the special characters are escaped with \\u\u003c/code> followed by the decimal unicode and by a question mark (which is the replacement character which should be printed in the case that displaying the special character is not possible). I want to have the text in a string variable in C# which is equivalent to the following variable:

string expected = "10 \u207A\u2265 0";

In the debugger I want to see the variable to have the value of 10 ⁺≥ 0 . I therefore must replace every occurence by the corresponding hexadecimal unicode (#207A = 8314 and #2265 = 8805). What is the simplest way to accomplish this with regular expressions?

The code is:

string str = @"10 \u8314?\u8805? 0";
string replaced = Regex.Replace(str, @"\\u([0-9]+)\?", match => {
    string value = match.Groups[1].Value;
    string hex = @"\u" + int.Parse(value).ToString("X4");
    return hex;
});

This will return

string line = @"10 \u207A\u2265 0";

so the \⁺\≥ won't be unescaped.

Note that the value is first converted to a number ( int.Parse(value) ) and then converted to a fixed-notation 4 digits hex number ( ToString("X4") )

Or

string replaced = Regex.Replace(str, @"\\u([0-9]+)\?", match => {
    string value = match.Groups[1].Value;
    char ch = (char)int.Parse(value);
    return ch.ToString();
});

This will return

string line = @"10 ⁺≥ 0";

You have to use MatchEvaluator:

string input = "10 \u8314?\u8805? 0";
Regex reg = new Regex(@"\\u([A-Fa-f0-9]+)\?",RegexOptions.Multiline);
string result = reg.Replace(input, delegate(Match m) {
    return ConvertToWhatYouWant(m.Value); 
});

If I understood your question correctly, you want to parse the unicode representation of the RTF to a C# string.

So, the one-liner solution looks like this

string result = Regex.Replace(line, @"\\u(\d+?)\?", new MatchEvaluator(m => ((char)Convert.ToInt32(m.Groups[1].Value)).ToString()));

But I suggest to use a cleaner code:

private static string ReplaceRtfUnicodeChar(Match match) {
    int number = Convert.ToInt32(match.Groups[1].Value);
    char chr = (char)number;
    return chr.ToString();
}

public static void Main(string[] args)
{
    string line= @"10 \u8314?\u8805? 0";

    var r = new Regex(@"\\u(\d+?)\?");
    string result = r.Replace(line, new MatchEvaluator(ReplaceRtfUnicodeChar));

    Console.WriteLine(result); // Displays 10 ⁺≥ 0
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM