简体   繁体   English

正则表达式删除特殊/不可见的字符

[英]Regex to remove special/invisible characters

the problem is to remove some strange, characters from domain name, but keep special unicode characters such as accented letters (german, danish of polish language) For example: radisson-blu.es, you cant see, but there's additional char between ss. 问题是从域名中删除一些奇怪的字符,但保留特殊的unicode字符,如重音字母(德语,波兰语的丹麦语)例如:radisson-blu.es,你看不到,但ss之间有额外的字符。 (Try to copy to notepad to see it). (尝试复制到记事本看看)。

I've seen many posts about similar problems, but each solution doesn't remove that special character, or it's removing it, but also other special characters i need to keep. 我已经看过很多关于类似问题的帖子,但是每个解决方案都没有删除那个特殊字符,或者它正在删除它,还有其他需要保留的特殊字符。

用空字符串替换正则表达式[^\\w\\s.,!@#$%^&*()=+~`-]

The character you're (not) seeing there is U+00AD Soft Hyphen. 你(不)看到的角色是U + 00AD Soft Hyphen。 You can reference it in a regular expression using , eg: 您可以使用在正则表达式中引用它,例如:

Regex.Replace(str, @"\u00ad", "");

But for a single-character replacement you could also use string.Replace as well. 但对于单字符替换,您也可以使用string.Replace

'\\xAD' is a soft hyphen (the codepoint's name is "SOFT HYPHEN" ). '\\xAD'是一个软连字符(代码点的名称是"SOFT HYPHEN" )。

According to the Unicode codepoint database, its category is "Cf" (or "Format" ), so it can be matched with the regex @"\\p{Cf}" . 根据Unicode代码点数据库,其类别是"Cf" (或"Format" ),因此它可以与正则表达式@"\\p{Cf}"匹配。

Strangely, Microsoft Visual C# 2010 Express says that it doesn't match @"\\p{Cf}" , but instead matches @"\\p{Pd}" ( "Dash Punctuation" ), the same category as the normal hyphen. 奇怪的是,Microsoft Visual C#2010 Express表示它与@"\\p{Cf}"不匹配,而是匹配@"\\p{Pd}""Dash Punctuation" ),与普通连字符相同。

这对我有用:

[\x00-\x1f]|[\x81\x8d\x8d\x8f\x90\x9d\xa0\u2060\uFEFF]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM