I have the following line from a RTF document
10 \u8314?\u8805? 0
(which says in clear text 10 ⁺≥ 0
). You can see that the special characters are escaped with \\u\u003c/code> followed by the decimal unicode and by a question mark (which is the replacement character which should be printed in the case that displaying the special character is not possible).
I want to have the text in a string variable in C# which is equivalent to the following variable:
string expected = "10 \u207A\u2265 0";
In the debugger I want to see the variable to have the value of
10 ⁺≥ 0
. I therefore must replace every occurence by the corresponding hexadecimal unicode (#207A = 8314 and #2265 = 8805). What is the simplest way to accomplish this with regular expressions?
The code is:
string str = @"10 \u8314?\u8805? 0";
string replaced = Regex.Replace(str, @"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
string hex = @"\u" + int.Parse(value).ToString("X4");
return hex;
});
This will return
string line = @"10 \u207A\u2265 0";
so the \⁺\≥
won't be unescaped.
Note that the value is first converted to a number ( int.Parse(value)
) and then converted to a fixed-notation 4 digits hex number ( ToString("X4")
)
Or
string replaced = Regex.Replace(str, @"\\u([0-9]+)\?", match => {
string value = match.Groups[1].Value;
char ch = (char)int.Parse(value);
return ch.ToString();
});
This will return
string line = @"10 ⁺≥ 0";
You have to use MatchEvaluator:
string input = "10 \u8314?\u8805? 0";
Regex reg = new Regex(@"\\u([A-Fa-f0-9]+)\?",RegexOptions.Multiline);
string result = reg.Replace(input, delegate(Match m) {
return ConvertToWhatYouWant(m.Value);
});
If I understood your question correctly, you want to parse the unicode representation of the RTF to a C# string.
So, the one-liner solution looks like this
string result = Regex.Replace(line, @"\\u(\d+?)\?", new MatchEvaluator(m => ((char)Convert.ToInt32(m.Groups[1].Value)).ToString()));
But I suggest to use a cleaner code:
private static string ReplaceRtfUnicodeChar(Match match) {
int number = Convert.ToInt32(match.Groups[1].Value);
char chr = (char)number;
return chr.ToString();
}
public static void Main(string[] args)
{
string line= @"10 \u8314?\u8805? 0";
var r = new Regex(@"\\u(\d+?)\?");
string result = r.Replace(line, new MatchEvaluator(ReplaceRtfUnicodeChar));
Console.WriteLine(result); // Displays 10 ⁺≥ 0
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.