I'm parsing html using HTML Agility Pack and from time to time I get weird looking strings like:"–". What is the simplest way to remove them ? By the way, I'm using C#.
You probably need to look into why you are getting those characters in the first place, and it will likely be something is wrong with the encoding
But if you do need to remove all the non-ascii characters from a string, the regex [^ -~] does the trick
var stripped = Regex.Replace("străipped of baâ€d charâ€cters", "[^ -~]", "");
Console.WriteLine(stripped); //outputs "stripped of bad characters"
see http://www.catonmat.net/blog/my-favorite-regex/ for the explanation of why that regex works
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.