I am trying to create a visualisation that removes non punctuation characters, but still keeps track of chapters in the string.
So far I achieved to create the following regex to match what I want to preserve:
(\\CAPÍTULO ([0-9]+))|([\\\\?.,:;!¡¿。、·*\\(\\)\\[\\]\\-–\\_«»\\'\\'\\/@#$&\\%\\^=+\\|<>\\"])
How can I do to replace the rest of the text from a string?
You can easily use the replace
function, try this for an example:
var str = "whatever, string, you like!"; var newStr = str.replace(/\\w/g, ' '); console.log(newStr);
Try this:
var r = /(CAPÍTULO [0-9]+|[\\?.,:;!¡¿。、·*\(\)\[\]\-–\_«»\'\'\/@#$&\%\^=+\|<>\"])|(.)/g
var s = "ABC!@#123^&*XYZ;";
var p = s.replace(r, "$1");
// Result: "!@#^&*;"
First, it matches all characters in group 1, and any non-matches fall into the second group. Since the result of a group number (ie $1) is empty if not found, this effectively clears everything except the first group matches.
If you need to preserve the placement of the other characters you could do this:
var r = /(CAPÍTULO [0-9]+|[\\?.,:;!¡¿。、·*\(\)\[\]\-–\_«»\'\'\/@#$&\%\^=+\|<>\"])|(.)/g
var s = "ABC!@#123^&*XYZ;";
s.replace(r, "$1,").split(",");
// Result: ["", "", "", "!", "@", "#", "", "", "", "^", "&", "*", "", "", "", ";", ""]
You just need to handle the extra ""
at the end. ;) This works because instead of replacing with nothing, (ie "$1"
becomes ""
for non-matches), the comma (or any character you want really), keeps track of the positions. You could use anything, such as s.replace(r, "$1~").split("~");
also.
removes non punctuation characters, but still keeps track of chapters in the string.
The classic way to do this is to match on the thing you do want to keep ( CAPITULO [0-9]+
), OR ( |
) other things you don't want to keep--in this case, non-punctuation characters ( \\w
), and capture that, then replace the latter with an empty string.
const regexp = /CAPITULO [0-9]+|(\\w)/g; const input = "CAPITULO 22 .#( nonpunctuation characters $%&*'"; const processed = input.replace(regexp, (match, match1) => match1 ? '' : match); console.log(processed);
If you really want to list out all the punctuation characters to preserve, then replace \\w
in the above with
[^\\?.,:;!¡¿。、·*()[\]\-–_«»'\/@#$&\%\^=+\|<>\"]
If you want to preserve the placement of the remaining characters, then change the replacement function to
(match, match1) => match1 ? ' ' : match
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.