I have a small Node Script which is web Scraping a Web Page. From that page I am extracting an array of Strings.
I am trying to clean up those Strings (currently with regex and string.replace)
One example String looks like this:
2 Glücklich sind die, die seine Erinnerungen beachten,+die mit ganzem Herzen nach ihm suchen.+\\n
My cleaning code looks like this.
string.replace(/\+/g, '').replace(/\*/g, '').replace('\n', '').replace(/(^\d+)/g, '').trim()
The first section removes all "+", the second removes all *, the third removes the new Line and the last one removes the leading number.
The most things work fine but I have some edge cases. This is my Result:
2 Glücklich sind die,die seine Erinnerungen beachten,die mit ganzem Herzen nach ihm suchen.
Problems:
My goal is to parse every String correctly. I have thousands of strings with different combinations but only "+", *, "\\n" and the number as special characters.
The String should look like this:
Glücklich sind die, die seine Erinnerungen beachten, die mit ganzem Herzen nach ihm suchen.
Hopefully someone has an idea to accomplish that.
You could use an alternation |
with a character class [+*\\n]
to match either one of the characters or 1+ digits ^\\d+
at the start of the string.
[+*\n]|^\d+
In the replacement use a space. Afterwards, replace all the 2 or more spaces with a single space.
let pattern = /[+*\\n]|^\\d+/g; let string = "2 Glücklich sind die,*die seine Erinnerungen* beachten,+die mit ganzem Herzen nach ihm suchen.+\\n"; string = string .replace(pattern, " ") .replace(/[ ]{2,}/g, " ") .trim(); console.log(string);
If the digits at the start of the string can be preceded by optional whitespace chars, you could match those as well by matching 0+ times a whitespace char except a newline ^[^\\S\\r\\n]*\\d+
let pattern = /[+*\\n]|^[^\\S\\r\\n]*\\d+/g; let string = " 2 Glücklich sind die,*die seine Erinnerungen* beachten,+die mit ganzem Herzen nach ihm suchen.+\\n"; string = string .replace(pattern, " ") .replace(/[ ]{2,}/g, " ") .trim(); console.log(string);
Perhaps this?
let str = `2 Glücklich sind die,*die seine Erinnerungen* beachten,+die mit ganzem Herzen nach ihm suchen.+\\n` str = str.replace(/[\\*\\+]/g," ") .replace(/^\\d+(\\s+)?/,"") // or add .trim() .replace(/\\n?/,"") .replace(/\\s{2,}/g," ") console.log(str)
You can achieve all your goals with a fairly short regex, and a single call to String.prototype.replace
:
let cleanStr = str => str.replace(/^[0-9\\s]*|[+*\\r\\n]/g, ''); console.log(cleanStr('2 Glücklich sind die,die seine Erinnerungen beachten,+die mit ganzem Herzen nach ihm suchen.+\\n'));
This regex detects either ^[0-9\\s]*
or [+*\\r\\n]
(and these sequences will be replaced with the empty string).
^[0-9\\s]*
replaces any number of consecutive digit or whitespace characters at the beginning of the string.
^[+*\\r\\n]
removes any "+", "*", or newline characters (including \\r
, which could be significant in windows environments) which occur anywhere in the string.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.