[英]How to remove word in string based on array in Javascript when word's character length in string is fewer than in array?
I want to remove some word in string based on array. 我想删除基于数组的字符串中的某些单词。 But the word's character length in string is fewer than in array.
但是字符串中单词的字符长度小于数组中的字符长度。 Is it possible to match it using regex and then replace it with empty string?
是否可以使用正则表达式匹配它,然后将其替换为空字符串? If not, what is the alternatives?
如果没有,那有什么替代方案?
I tried using regex to match the word, but i can't achieve it. 我尝试使用正则表达式来匹配该词,但我无法实现。 I don't know how to make regex match minimum 3 character from the array.
我不知道如何使正则表达式匹配数组中的最少3个字符。
array = ['reading', 'books'];
string = 'If you want to read the book, just read it.';
desiredOutput = 'If you want to the , just it.';
// Desired match
'reading' -> match for 'rea', 'read', 'readi', 'readin', 'reading'
'books' -> match for 'boo', 'book', 'books'
One option is to match 3 or more word characters starting at a word boundary, then use a replacer function to return the empty string if any of the words startsWith
the word in question: 一种选择是匹配了3个或多个单词字符开始于一个单词边界,然后用替代品函数返回空字符串如有的话
startsWith
有问题的话:
const array = ['reading', 'books']; const string = 'If you want to read the book, just read it.'; const output = string.replace( /\\b\\w{3,}/g, word => array.some(item => item.startsWith(word)) ? '' : word ); console.log(output);
The answer from CertainPerformance is better - easier to implement and to maintain but it's worth noting that - you can also generate a regex from the array. 当然,PerformancePerformance的答案更好-易于实现和维护,但值得注意的是-您还可以从数组生成正则表达式。
The idea is simple enough - if you want to match r
, re
, rea
, read
, readi
, readin
, reading
the regex for that is reading|readin|readi|read|rea|re|r
. 这个想法很简单-如果要匹配
r
, re
, rea
, read
, readi
, readin
,则reading
正则表达式就是reading|readin|readi|read|rea|re|r
。 The reason you want the longest variation first is because otherwise the regex engine will stop at the first match in finds: 首先需要最长的变化的原因是,否则正则表达式引擎将在发现的第一个匹配项处停止:
let regex = /r|re|rea|read/g // ↑_________________ console.log( // | "read".replace(regex, "")// | // ↑___________________________| )
So you can take a word and break it out in a this pattern to generate a regex from it 因此,您可以选择一个单词并以这种模式将其分解以从中生成一个正则表达式
function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } console.log(allSubstrings("reading"))
With that you can simply generate the regex you need. 这样,您可以简单地生成所需的正则表达式。
function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } function toPattern(word) { let substrings = allSubstrings(word); let pattern = substrings.join("|"); return pattern; } console.log(toPattern("reading"))
The final thing is to take an array and convert it to a regex. 最后一件事是获取一个数组并将其转换为正则表达式。 Which requires treating each word and then combining each individual regex into one that matches any of the words:
这需要处理每个单词 ,然后将每个正则表达式组合成与任何单词匹配的一个:
const array = ['reading', 'books']; const string = 'If you want to read the book, just read it.'; //generate the pattern let pattern = array .map(toPattern) //first, for each word .join("|"); //join patterns for all words //convert the pattern to a regex let regex = new RegExp(pattern, "g"); let result = string.replace(regex, ""); //desiredOutput: 'If you want to the , just it.'; console.log(result); function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } function toPattern(word) { let substrings = allSubstrings(word); let pattern = substrings.join("|"); return pattern; }
So, this is how you can generate a regular expression from that array. 所以,这是你可以生成从阵列中的正则表达式。 In this case, that works, but it's not guaranteed to, because there is a danger it could match something you don't want.
在这种情况下,这是可行的,但不能保证一定可行,因为存在危险,它可能会匹配您不想要的东西。 For example,
r
will match any character, it doesn't necessarily need to be in a word that matches this. 例如,
r
可以匹配任何字符,不一定需要用单词匹配。
const array = ['reading']; const string = 'The quick brown fox jumps over the lazy dog'; // ^ ^ let pattern = array .map(word => allSubstrings(word).join("|")) .join("|"); let regex = new RegExp(pattern, "g"); let result = string.replace(regex, ""); console.log(result); function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; }
Which is when it becomes more complicated, as you want to generate a more complicated pattern for each word. 当您想为每个单词生成一个更复杂的模式时,它变得更加复杂。 You generally want to match words , so you can use the word boundary character
\\b
which means that the pattern for "reading" can now look like this: 通常,您希望匹配单词 ,因此可以使用单词边界字符
\\b
,这意味着“读取”的模式现在看起来像这样:
\breading\b|\breadin\b|\breadi\b|\bread\b|\brea\b|\bre\b|\br\b
↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑ ↑↑
In the interest of keeping the output at least somewhat readable, it can instead be put in a group and the whole group made to match a single word: 为了使输出至少具有一定的可读性,可以将其放在一个组中,并使整个组匹配单个单词:
\b(?:reading|readin|readi|read|rea|re|r)\b
↑↑
||____ non-capturing group
So, you have to generate this pattern 因此,您必须生成此模式
function toPattern(word) {
let substrings = allSubstrings(word);
//escape backslashes, because this is a string literal and we need \b as content
let pattern = "\\b(?:" + substrings.join("|") + ")\\b";
return pattern;
}
Which leads us to this 这导致我们
const array = ['reading', 'books']; const string = 'The quick brown fox jumps over the lazy dog. If you want to read the book, just read it.'; let pattern = array .map(toPattern) .join("|"); let regex = new RegExp(pattern, "g"); let result = string.replace(regex, ""); console.log(result); function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } function toPattern(word) { let substrings = allSubstrings(word); let pattern = "\\\\b(?:" + substrings.join("|") + ")\\\\b"; return pattern; }
This will suffice to solve your task. 这足以解决您的任务。 So it's possible to generate a regex.
因此可以生成一个正则表达式。 The final one looks like this:
最后一个看起来像这样:
/\b(?:reading|readin|readi|read|rea|re|r)\b|\b(?:books|book|boo|bo|b)\b/g
But most of the generation of it is spent trying to generate something that works . 但大部分的产生都花在试图产生一些作品 。 It's not a necessarily complex solution but as mentioned, the one suggested by CertainPerformance is better because it's simpler which means less chance of it failing and it would be easier to maintain for the future.
它不一定是复杂的解决方案,但是正如前面提到的,SomePerformance建议的解决方案更好,因为它更简单,这意味着它失败的机会更少,并且将来更容易维护。
I don't know of a straight way to do it, but you can create your own regexp pattern, like so: 我不知道这样做的直接方法,但是您可以创建自己的regexp模式,如下所示:
// This function create a regex pattern string for each word in the array.
// The str is the string value (the word),
// min is the minimum required letters in eac h word
function getRegexWithMinChars(str, min) {
var charArr = str.split("");
var length = charArr.length;
var regexpStr = "";
for(var i = 0; i < length; i++){
regexpStr +="[" + charArr[i] + "]" + (i < min ? "" : "?");
}
return regexpStr;
}
// This function returns a regexp object with the patters of the words in the array
function getStrArrayRegExWithMinChars(strArr, min) {
var length = strArr.length;
var regexpStr = "";
for(var i = 0; i < length; i++) {
regexpStr += "(" + getRegexWithMinChars(strArr[i], min) + ")?";
}
return new RegExp(regexpStr, "gm");
}
var regexp = getStrArrayRegExWithMinChars(searchArr, 3);
// With the given regexp I was able to use string replace to
// find and replace all the words in the string
str.replace(regexp, "");
//The same can be done with one ES6 function
const getStrArrayRegExWithMinChars = (searchArr, min) => {
return searchArr.reduce((wordsPatt, word) => {
const patt = word.split("").reduce((wordPatt, letter, index) => {
return wordPatt + "[" + letter + "]" + (index < min ? "" : "?");
},"");
return wordsPatt + "(" + patt + ")?";
}, "");
}
var regexp = getStrArrayRegExWithMinChars(searchArr, 3);
// With the given regexp I was able to use string replace to
// find and replace all the words in the string
str.replace(regexp, "");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.