简体   繁体   中英

Regex for removing special characters on a multilingual string javascript

I am working off this answer here: Regex for removing special characters on a multilingual string :

/\P{Xan}+/u

but this appears to be for PHP, I am not any good at regex, so what would the javascript equivelent be?

When I use the regex in the example answer, I get an invalid expression error telling me there is an invalid escape?

search(event) {
    const length = (string) => {
        if (string.length > 1) {
            return true;
        }
        return false;
    };
    const trim = (string) => {
        if (string.trim() !== '') {
            return true;
        }
        return false;
    };
    const keyType = (string) => {
        const regex = /\P{Xan}+/u;
        if (!regex.exec(string)) {
            return true;
        }
        return false;
    };
    const text = this.searchListParams.searchText;
    if (length(text) && trim(text) && keyType(text)) {
        this.searchSubject.next(this.searchListParams);
    } else {
        this.mediaListParams.startRow = 0;
        this.listSubject.next(this.mediaListParams);
    }
}

The /\\P{Xan}+/u pattern in PHP matches any 1+ chars that is not a Unicode letter or digit.

If you need to support any browser or JS implementation, use XRegExp and the [^\\pL\\pN]+ pattern that matches any 1+ chars other than Unicode letters ( \\pL ) and digits ( \\pN ):

 var rx = XRegExp("[^\\\\pL\\\\pN]+", "g"); var s = "8੦৪----Łąka!!!!Вася, *** ,Café"; var res = XRegExp.replace(s, rx, ' ') console.log("'"+s+"'", "=>", "'"+res+"'"); 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script> 

If you plan to only support ECMAScript 2018 compatible implementations, you can use this native regex:

 const rx = /[^\\p{L}\\p{N}]+/gu; const s = "8੦৪----Łąka!!!!Вася, *** ,Café"; let res = s.replace(rx, " "); console.log(`'${s}' => '${res}'`) 

The u modifier is important to enable the Unicode category class support in ES2018 regex.

I'm not familiar with PHP syntax, but in JavaScript, the curly brackets {} are used as quantifiers . This is probably causing your error.

That being said, the PHP regex does not have the same meaning in JavaScript as it does in PHP. Unfortunately, AFAIK there is no predefined character class equivalent to the PHP regex you provided in JavaScript, so I don't think I can provide a regular expression to solve your question explicitly.


However, one creative potential solution that does not employ regular expressions in JS is suggested in this answer , but it will only work for Latin-based alphabets (languages with capitalization) and only for word characters (not numbers). Here is a basic implementation (modified from linked answer):

function removeSpecials(str) {
    var lower = str.toLowerCase();
    var upper = str.toUpperCase();

    var res = "";
    for(var i=0; i<lower.length; ++i) {

        // test if character or numeric using capitalization test
        if(lower[i] != upper[i] || /\d/.exec(lower[i]))
            res += str[i];

    }
    return res;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM