简体   繁体   English

使用 RegExp 删除所有特殊字符

[英]Remove all special characters with RegExp

I would like a RegExp that will remove all special characters from a string.我想要一个能从字符串中删除所有特殊字符的正则表达式。 I am trying something like this but it doesn't work in IE7, though it works in Firefox.我正在尝试这样的事情,但它在 IE7 中不起作用,尽管它在 Firefox 中起作用。

var specialChars = "!@#$^&%*()+=-[]\/{}|:<>?,.";

for (var i = 0; i < specialChars.length; i++) {
  stringToReplace = stringToReplace.replace(new RegExp("\\" + specialChars[i], "gi"), "");
}

A detailed description of the RegExp would be helpful as well. RegExp 的详细描述也会有所帮助。

var desired = stringToReplace.replace(/[^\w\s]/gi, '')

As was mentioned in the comments it's easier to do this as a whitelist - replace the characters which aren't in your safelist.正如评论中提到的那样,将其作为白名单更容易 - 替换不在安全列表中的字符。

The caret ( ^ ) character is the negation of the set [...] , gi say global and case-insensitive (the latter is a bit redundant but I wanted to mention it) and the safelist in this example is digits, word characters, underscores ( \\w ) and whitespace ( \\s ).插入符号 ( ^ ) 字符是集合[...]的否定, gi表示全局gi区分大小写(后者有点多余,但我想提一下),并且此示例中的安全列表是数字、单词字符, 下划线 ( \\w ) 和空格 ( \\s )。

Note that if you still want to exclude a set, including things like slashes and special characters you can do the following:请注意,如果您仍想排除一个集合,包括斜杠和特殊字符之类的内容,您可以执行以下操作:

var outString = sourceString.replace(/[`~!@#$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');

take special note that in order to also include the "minus" character, you need to escape it with a backslash like the latter group.请特别注意,为了还包括“减号”字符,您需要像后一组那样使用反斜杠对其进行转义。 if you don't it will also select 0-9 which is probably undesired.如果你不这样做,它也会选择 0-9 这可能是不受欢迎的。

Plain Javascript regex does not handle Unicode letters .纯 Javascript 正则表达式不处理 Unicode 字母

Do not use [^\\w\\s] , this will remove letters with accents (like àèéìòù), not to mention to Cyrillic or Chinese, letters coming from such languages will be completed removed.不要使用[^\\w\\s] ,这将删除带重音的字母(如 àèéìòù),更不用说西里尔文或中文,来自此类语言的字母将被完全删除。

You really don't want remove these letters together with all the special characters.您真的不想将这些字母与所有特殊字符一起删除。 You have two chances:你有两个机会:

  • Add in your regex all the special characters you don't want remove,在您的正则表达式中添加您不想删除的所有特殊字符,
    for example: [^èéòàùì\\w\\s] .例如: [^èéòàùì\\w\\s]
  • Have a look at xregexp.com .看看xregexp.com XRegExp adds base support for Unicode matching via the \\p{...} syntax. XRegExp 通过\\p{...}语法添加了对 Unicode 匹配的基本支持。

 var str = "Їжак::: résd,$%& adùf" var search = XRegExp('([^?<first>\\\\pL ]+)'); var res = XRegExp.replace(str, search, '',"all"); console.log(res); // returns "Їжак::: resd,adf" console.log(str.replace(/[^\\w\\s]/gi, '') ); // returns " rsd adf" console.log(str.replace(/[^\\wèéòàùì\\s]/gi, '') ); // returns " résd adùf"
 <script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.js"></script>

The first solution does not work for any UTF-8 alphabet.第一个解决方案不适用于任何 UTF-8 字母表。 (It will cut text such as Їжак). (它将剪切文本,例如 Їжак)。 I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine.我设法创建了一个不使用 RegExp 并在 JavaScript 引擎中使用良好的 UTF-8 支持的函数。 The idea is simple if a symbol is equal in uppercase and lowercase it is a special character.这个想法很简单,如果一个符号的大小写相等,那么它就是一个特殊字符。 The only exception is made for whitespace.唯一的例外是空白。

function removeSpecials(str) {
    var lower = str.toLowerCase();
    var upper = str.toUpperCase();

    var res = "";
    for(var i=0; i<lower.length; ++i) {
        if(lower[i] != upper[i] || lower[i].trim() === '')
            res += str[i];
    }
    return res;
}

Update: Please note, that this solution works only for languages where there are small and capital letters.更新:请注意,此解决方案仅适用于有小写和大写字母的语言。 In languages like Chinese, this won't work.在像中文这样的语言中,这是行不通的。

Update 2: I came to the original solution when I was working on a fuzzy search.更新 2:当我进行模糊搜索时,我来到了原始解决方案。 If you also trying to remove special characters to implement search functionality, there is a better approach.如果您还尝试删除特殊字符以实现搜索功能,则有更好的方法。 Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters.使用任何可以仅从拉丁字符生成字符串的音译库,然后简单的 Regexp 将完成删除特殊字符的所有魔术。 (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso ). (这也适用于中国人,您也将通过使Tromsø == Tromso获得附带好处)。

using \\W or [a-z0-9] regex won't work for non english languages like chinese etc.,使用\\W[a-z0-9]正则表达式不适用于中文等非英语语言,

It's better to use all special characters in regex and exclude them from given string最好在正则表达式中使用所有特殊字符并将它们从给定的字符串中排除

str.replace(/[~`!@#$%^&*()+={}\[\];:\'\"<>.,\/\\\?-_]/g, '');

I use RegexBuddy for debbuging my regexes it has almost all languages very usefull.我使用 RegexBuddy 调试我的正则表达式,它几乎对所有语言都非常有用。 Than copy/paste for the targeted language.比复制/粘贴目标语言。 Terrific tool and not very expensive.很棒的工具,而且不是很贵。

So I copy/pasted your regex and your issue is that [,] are special characters in regex, so you need to escape them.所以我复制/粘贴了你的正则表达式,你的问题是 [,] 是正则表达式中的特殊字符,所以你需要对它们进行转义。 So the regex should be : /!@#$^&%*()+=-[\\x5B\\x5D]\\/{}|:<>?,./im所以正则表达式应该是: /!@#$^&%*()+=-[\\x5B\\x5D]\\/{}|:<>?,./im

str.replace(/\\s|[0-9_]|\\W|[#$%^&*()]/g, "") I did sth like this. str.replace(/\\s|[0-9_]|\\W|[#$%^&*()]/g, "")我就是这样做的。 But there is some people who did it much easier like str.replace(/\\W_/g,"");但是有些人做得更容易,比如str.replace(/\\W_/g,"");

@Seagull anwser ( https://stackoverflow.com/a/26482552/4556619 ) looks good but you get undefined string in result when there are some special (turkish) characters. @Seagull anwser( https://stackoverflow.com/a/26482552/4556619 )看起来不错,但是当有一些特殊(土耳其语)字符时,结果中会出现未定义的字符串。 See example below.请参见下面的示例。

let str="bənövşəyi 😟пурпурный İdÖĞ";

i slightly improve it and patch with undefined check.我稍微改进它并使用未定义的检查进行修补。

function removeSpecials(str) {
    let lower = str.toLowerCase();
    let upper = str.toUpperCase();

    let res = "",i=0,n=lower.length,t;
    for(i; i<n; ++i) {
        if(lower[i] !== upper[i] || lower[i].trim() === ''){
            t=str[i];
            if(t!==undefined){
                res +=t;
            }
        }
    }
    return res;
}
text.replace(/[`~!@#$%^*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');

why dont you do something like:你为什么不做这样的事情:

re = /^[a-z0-9 ]$/i;
var isValid = re.test(yourInput);

to check if your input contain any special char检查您的输入是否包含任何特殊字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM