简体   繁体   中英

Remove everything not matched by regex

I am trying to create a visualisation that removes non punctuation characters, but still keeps track of chapters in the string.

So far I achieved to create the following regex to match what I want to preserve:

(\\CAPÍTULO ([0-9]+))|([\\\\?.,:;!¡¿。、·*\\(\\)\\[\\]\\-–\\_«»\\'\\'\\/@#$&\\%\\^=+\\|<>\\"])

How can I do to replace the rest of the text from a string?

You can easily use the replace function, try this for an example:

 var str = "whatever, string, you like!"; var newStr = str.replace(/\\w/g, ' '); console.log(newStr);

Try this:

var r = /(CAPÍTULO [0-9]+|[\\?.,:;!¡¿。、·*\(\)\[\]\-–\_«»\'\'\/@#$&\%\^=+\|<>\"])|(.)/g
var s = "ABC!@#123^&*XYZ;";
var p = s.replace(r, "$1");

// Result: "!@#^&*;"

First, it matches all characters in group 1, and any non-matches fall into the second group. Since the result of a group number (ie $1) is empty if not found, this effectively clears everything except the first group matches.

If you need to preserve the placement of the other characters you could do this:

var r = /(CAPÍTULO [0-9]+|[\\?.,:;!¡¿。、·*\(\)\[\]\-–\_«»\'\'\/@#$&\%\^=+\|<>\"])|(.)/g
var s = "ABC!@#123^&*XYZ;";
s.replace(r, "$1,").split(",");

// Result: ["", "", "", "!", "@", "#", "", "", "", "^", "&", "*", "", "", "", ";", ""]

You just need to handle the extra "" at the end. ;) This works because instead of replacing with nothing, (ie "$1" becomes "" for non-matches), the comma (or any character you want really), keeps track of the positions. You could use anything, such as s.replace(r, "$1~").split("~"); also.

removes non punctuation characters, but still keeps track of chapters in the string.

The classic way to do this is to match on the thing you do want to keep ( CAPITULO [0-9]+ ), OR ( | ) other things you don't want to keep--in this case, non-punctuation characters ( \\w ), and capture that, then replace the latter with an empty string.

 const regexp = /CAPITULO [0-9]+|(\\w)/g; const input = "CAPITULO 22 .#( nonpunctuation characters $%&*'"; const processed = input.replace(regexp, (match, match1) => match1 ? '' : match); console.log(processed);

If you really want to list out all the punctuation characters to preserve, then replace \\w in the above with

[^\\?.,:;!¡¿。、·*()[\]\-–_«»'\/@#$&\%\^=+\|<>\"]

If you want to preserve the placement of the remaining characters, then change the replacement function to

(match, match1) => match1 ? ' ' : match

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM