[英]Remove all non-ASCII characters from string
I have a C# routine that imports data from a CSV file, matches it against a database and then rewrites it to a file.我有一个 C# 例程,它从 CSV 文件导入数据,将其与数据库匹配,然后将其重写到文件中。 The source file seems to have a few non-ASCII characters that are fouling up the processing routine.
源文件似乎有一些非 ASCII 字符,这些字符扰乱了处理例程。
I already have a static method that I run each input field through but it performs basic checks like removing commas and quotes.我已经有一个 static 方法,我通过它运行每个输入字段,但它执行基本检查,例如删除逗号和引号。 Does anybody know how I could add functionality that removes non-ASCII characters too?
有人知道我如何添加删除非 ASCII 字符的功能吗?
Here a simple solution: 一个简单的解决方案:
public static bool IsASCII(this string value)
{
// ASCII encoding replaces non-ascii with question marks, so we use UTF8 to see if multi-byte sequences are there
return Encoding.UTF8.GetByteCount(value) == value.Length;
}
source: http://snipplr.com/view/35806/ 来源: http : //snipplr.com/view/35806/
string sOut = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(s))
Do it all at once 一劳永逸
public string ReturnCleanASCII(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
foreach(char c in s)
{
if((int)c > 127) // you probably don't want 127 either
continue;
if((int)c < 32) // I bet you don't want control characters
continue;
if(c == ',')
continue;
if(c == '"')
continue;
sb.Append(c);
}
return sb.ToString();
}
If you wanted to test a specific character, you could use 如果你想测试一个特定的角色,你可以使用
if ((int)myChar <= 127)
Just getting the ASCII encoding of the string will not tell you that a specific character was non-ASCII to begin with (if you care about that). 只是获取字符串的ASCII编码不会告诉您特定字符是非ASCII开头(如果您关心)。 See MSDN .
请参阅MSDN 。
Here's an improvement upon the accepted answer: 这是对已接受答案的改进:
string fallbackStr = "";
Encoding enc = Encoding.GetEncoding(Encoding.ASCII.CodePage,
new EncoderReplacementFallback(fallbackStr),
new DecoderReplacementFallback(fallbackStr));
string cleanStr = enc.GetString(enc.GetBytes(inputStr));
This method will replace unknown characters with the value of fallbackStr
, or if fallbackStr
is empty, leave them out entirely. 此方法将使用
fallbackStr
的值替换未知字符,或者如果fallbackStr
为空,则将它们完全保留。 (Note that enc
can be defined outside the scope of a function.) (注意,
enc
可以在函数范围之外定义。)
It sounds kind of strange that it's accepted to drop the non-ASCII. 删除非ASCII是可以接受的。
Also I always recommend the excellent FileHelpers library for parsing CSV-files. 另外,我总是推荐优秀的FileHelpers库来解析CSV文件。
strText = Regex.Replace(strText, @"[^\u0020-\u007E]", string.Empty);
public string RunCharacterCheckASCII(string s)
{
string str = s;
bool is_find = false;
char ch;
int ich = 0;
try
{
char[] schar = str.ToCharArray();
for (int i = 0; i < schar.Length; i++)
{
ch = schar[i];
ich = (int)ch;
if (ich > 127) // not ascii or extended ascii
{
is_find = true;
schar[i] = '?';
}
}
if (is_find)
str = new string(schar);
}
catch (Exception ex)
{
}
return str;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.