从字符串中删除无法识别的ASCII字符

Question

I'm parsing html using HTML Agility Pack and from time to time I get weird looking strings like:"â€“". 我正在使用HTML Agility Pack解析html，时不时出现一些奇怪的字符串，例如：“â€”。 What is the simplest way to remove them ? 删除它们的最简单方法是什么？ By the way, I'm using C#. 顺便说一句，我正在使用C＃。

Answer 1

You probably need to look into why you are getting those characters in the first place, and it will likely be something is wrong with the encoding 您可能需要研究为什么首先要获取这些字符，并且编码可能有问题

But if you do need to remove all the non-ascii characters from a string, the regex [^ -~] does the trick 但是，如果您确实需要从字符串中删除所有非ASCII字符，则使用正则表达式[^-〜]可以解决问题

        var stripped = Regex.Replace("străipped of baâ€d charâ€cters", "[^ -~]", "");
        Console.WriteLine(stripped); //outputs "stripped of bad characters"

see http://www.catonmat.net/blog/my-favorite-regex/ for the explanation of why that regex works 请参阅http://www.catonmat.net/blog/my-favorite-regex/了解有关该正则表达式工作原理的说明

从字符串中删除无法识别的ASCII字符

问题描述

1 个解决方案

解决方案1
9 已采纳 2012-11-22 09:29:03

从字符串中删除无法识别的ASCII字符

问题描述

1 个解决方案

解决方案1 9 已采纳 2012-11-22 09:29:03

解决方案1
9 已采纳 2012-11-22 09:29:03