简体   繁体   中英

Replace all non-word characters like ?*+#

I need a some help to replace all non-word characters in a string.

As an example (stadtbezirkspräsident' should become stadtbezirkspräsident .

This Regex should work for all languages so it's kind of tricky because I have no idea how to match characters like ñ or œ . I tried solving this with

string.replace(/[&\/\\#,+()$~%.'":*?<>-_{}]/g,' ');

but ther are still to many special characters like Ø left.

Perhaps there is a general Selector for this, or anybody has solved this problem before?

尝试使用技巧

str.replace(/(?!\w)[\x00-\xC0]/g, '')

If you have define all the Unicode ranges yourself, it's going to be a lot of work.

It might make more sense to use Steven Levithan's XRexExp package with Unicode add-ons and utilize its Unicode property shortcuts:

var regex = new XRegExp("\\P{L}+", "g")
string = XRegExp.replace(string, regex, "")

This is more of a comment to Tim Pietzcker's answer, but presenting code in comments is awkward... Here's a simple example of using the XRexExp package:

<p id=orig>Bundespräsident / ß+ð/ə¿α!</p>
<p id=new></p>
<script src="http://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-min.js">
</script>
<script src="http://xregexp.com/addons/unicode/unicode-base.js">
</script>
<script>
var regex = new XRegExp("\\P{L}+", "g");
var string = document.getElementById('orig').innerHTML;
string = XRegExp.replace(string, regex, "");
document.getElementById('new').innerHTML = string;
</script>

For production use, you would probably want to download some versions of the base package and the Unicode plug-in and use them on your server.

Note: The code checks for characters that are not classified as letters (alphabetic) in Unicode. I suppose this corresponds to what you mean by “word character”, though words in a natural language may contain hyphens, apostrophes, and other non-letters.

Beware that characters are added to Unicode, and the category of a character might (rarely) change. The package has been maintained well, though; it corresponds to Unicode 6.1 (version 6.2 is out, but it has no new letters).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM