简体   繁体   English

当字符串中单词的字符长度小于数组中字符的长度时,如何根据Javascript中的数组删除字符串中的单词?

[英]How to remove word in string based on array in Javascript when word's character length in string is fewer than in array?

I want to remove some word in string based on array. 我想删除基于数组的字符串中的某些单词。 But the word's character length in string is fewer than in array. 但是字符串中单词的字符长度小于数组中的字符长度。 Is it possible to match it using regex and then replace it with empty string? 是否可以使用正则表达式匹配它,然后将其替换为空字符串? If not, what is the alternatives? 如果没有,那有什么替代方案?

I tried using regex to match the word, but i can't achieve it. 我尝试使用正则表达式来匹配该词,但我无法实现。 I don't know how to make regex match minimum 3 character from the array. 我不知道如何使正则表达式匹配数组中的最少3个字符。

array = ['reading', 'books'];

string = 'If you want to read the book, just read it.';

desiredOutput = 'If you want to  the , just  it.';


// Desired match

'reading' -> match for 'rea', 'read', 'readi', 'readin', 'reading'

'books' -> match for 'boo', 'book', 'books'

One option is to match 3 or more word characters starting at a word boundary, then use a replacer function to return the empty string if any of the words startsWith the word in question: 一种选择是匹配了3个或多个单词字符开始于一个单词边界,然后用替代品函数返回空字符串如有的话startsWith有问题的话:

 const array = ['reading', 'books']; const string = 'If you want to read the book, just read it.'; const output = string.replace( /\\b\\w{3,}/g, word => array.some(item => item.startsWith(word)) ? '' : word ); console.log(output); 

The answer from CertainPerformance is better - easier to implement and to maintain but it's worth noting that - you can also generate a regex from the array. 当然,PerformancePerformance的答案更好-易于实现和维护,但值得注意的是-您还可以从数组生成正则表达式。

The idea is simple enough - if you want to match r , re , rea , read , readi , readin , reading the regex for that is reading|readin|readi|read|rea|re|r . 这个想法很简单-如果要匹配rrereareadreadireadin ,则reading正则表达式就是reading|readin|readi|read|rea|re|r The reason you want the longest variation first is because otherwise the regex engine will stop at the first match in finds: 首先需要最长的变化的原因是,否则正则表达式引擎将在发现的第一个匹配项处停止:

 let regex = /r|re|rea|read/g // ↑_________________ console.log( // | "read".replace(regex, "")// | // ↑___________________________| ) 

So you can take a word and break it out in a this pattern to generate a regex from it 因此,您可以选择一个单词并以这种模式将其分解以从中生成一个正则表达式

 function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } console.log(allSubstrings("reading")) 

With that you can simply generate the regex you need. 这样,您可以简单地生成所需的正则表达式。

 function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } function toPattern(word) { let substrings = allSubstrings(word); let pattern = substrings.join("|"); return pattern; } console.log(toPattern("reading")) 

The final thing is to take an array and convert it to a regex. 最后一件事是获取一个数组并将其转换为正则表达式。 Which requires treating each word and then combining each individual regex into one that matches any of the words: 这需要处理每个单词 ,然后将每个正则表达式组合成与任何单词匹配的一个:

 const array = ['reading', 'books']; const string = 'If you want to read the book, just read it.'; //generate the pattern let pattern = array .map(toPattern) //first, for each word .join("|"); //join patterns for all words //convert the pattern to a regex let regex = new RegExp(pattern, "g"); let result = string.replace(regex, ""); //desiredOutput: 'If you want to the , just it.'; console.log(result); function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } function toPattern(word) { let substrings = allSubstrings(word); let pattern = substrings.join("|"); return pattern; } 

So, this is how you can generate a regular expression from that array. 所以,这你可以生成从阵列中的正则表达式。 In this case, that works, but it's not guaranteed to, because there is a danger it could match something you don't want. 这种情况下,这是可行的,但不能保证一定可行,因为存在危险,它可能会匹配您不想要的东西。 For example, r will match any character, it doesn't necessarily need to be in a word that matches this. 例如, r可以匹配任何字符,不一定需要用单词匹配。

 const array = ['reading']; const string = 'The quick brown fox jumps over the lazy dog'; // ^ ^ let pattern = array .map(word => allSubstrings(word).join("|")) .join("|"); let regex = new RegExp(pattern, "g"); let result = string.replace(regex, ""); console.log(result); function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } 

Which is when it becomes more complicated, as you want to generate a more complicated pattern for each word. 当您想为每个单词生成一个更复杂的模式时,它变得更加复杂。 You generally want to match words , so you can use the word boundary character \\b which means that the pattern for "reading" can now look like this: 通常,您希望匹配单词 ,因此可以使用单词边界字符\\b ,这意味着“读取”的模式现在看起来像这样:

\breading\b|\breadin\b|\breadi\b|\bread\b|\brea\b|\bre\b|\br\b
↑↑       ↑↑ ↑↑      ↑↑ ↑↑     ↑↑ ↑↑    ↑↑ ↑↑   ↑↑ ↑↑  ↑↑ ↑↑ ↑↑

In the interest of keeping the output at least somewhat readable, it can instead be put in a group and the whole group made to match a single word: 为了使输出至少具有一定的可读性,可以将其放在一个组中,并使整个组匹配单个单词:

\b(?:reading|readin|readi|read|rea|re|r)\b
   ↑↑
   ||____ non-capturing group

So, you have to generate this pattern 因此,您必须生成此模式

function toPattern(word) {
  let substrings = allSubstrings(word);
  //escape backslashes, because this is a string literal and we need \b as content
  let pattern = "\\b(?:" + substrings.join("|") + ")\\b"; 

  return pattern;
}

Which leads us to this 这导致我们

 const array = ['reading', 'books']; const string = 'The quick brown fox jumps over the lazy dog. If you want to read the book, just read it.'; let pattern = array .map(toPattern) .join("|"); let regex = new RegExp(pattern, "g"); let result = string.replace(regex, ""); console.log(result); function allSubstrings(word) { let substrings = []; for (let i = word.length; i > 0; i--) { let sub = word.slice(0, i); substrings.push(sub) } return substrings; } function toPattern(word) { let substrings = allSubstrings(word); let pattern = "\\\\b(?:" + substrings.join("|") + ")\\\\b"; return pattern; } 

This will suffice to solve your task. 这足以解决您的任务。 So it's possible to generate a regex. 因此可以生成一个正则表达式。 The final one looks like this: 最后一个看起来像这样:

/\b(?:reading|readin|readi|read|rea|re|r)\b|\b(?:books|book|boo|bo|b)\b/g

But most of the generation of it is spent trying to generate something that works . 但大部分的产生都花在试图产生一些作品 It's not a necessarily complex solution but as mentioned, the one suggested by CertainPerformance is better because it's simpler which means less chance of it failing and it would be easier to maintain for the future. 它不一定是复杂的解决方案,但是正如前面提到的,SomePerformance建议的解决方案更好,因为它更简单,这意味着它失败的机会更少,并且将来更容易维护。

I don't know of a straight way to do it, but you can create your own regexp pattern, like so: 我不知道这样做的直接方法,但是您可以创建自己的regexp模式,如下所示:

// This function create a regex pattern string for each word in the array.
// The str is the string value (the word), 
// min is the minimum required letters in eac h word 
function getRegexWithMinChars(str, min) {
    var charArr = str.split("");
    var length = charArr.length;
    var regexpStr = "";
    for(var i = 0; i < length; i++){
        regexpStr +="[" + charArr[i] + "]" + (i < min ? "" : "?");
    }
    return regexpStr;
}

// This function returns a regexp object with the patters of the words in the array
function getStrArrayRegExWithMinChars(strArr, min) {
    var length = strArr.length;
    var regexpStr = "";
    for(var i = 0; i < length; i++) {
        regexpStr += "(" + getRegexWithMinChars(strArr[i], min) + ")?";
    }
    return new RegExp(regexpStr, "gm");
}

var regexp = getStrArrayRegExWithMinChars(searchArr, 3);

// With the given regexp I was able to use string replace to 
// find and replace all the words in the string
str.replace(regexp, "");

//The same can be done with one ES6 function
const getStrArrayRegExWithMinChars = (searchArr, min) => {
    return searchArr.reduce((wordsPatt, word) => {
        const patt = word.split("").reduce((wordPatt, letter, index) => {
                return wordPatt + "[" + letter + "]" + (index < min ? "" : "?");
            },"");
        return wordsPatt + "(" + patt + ")?";
    }, "");
}

var regexp = getStrArrayRegExWithMinChars(searchArr, 3);

// With the given regexp I was able to use string replace to 
// find and replace all the words in the string
str.replace(regexp, "");

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在Javascript数组中查找单词(字符串)? - How to find a word(string) in Javascript array? 如何将基于数组元素的字符串拆分成数组,保留javascript中的拆分字 - How to split the String based on array elements into array retaining the array the split word in javascript 如何从Javascript中的字符串中删除以某个字符开头的单词 - How to remove a word that starts with a certain character from a string in Javascript 检查字符串中的单词是否与数组中的单词匹配,如果匹配则将其从字符串中删除 - Check if a word in a string matches a word in an array and if so remove it from the string 如何根据字符串中的单词数组获取单词字符在字符串中的位置 - How to get position of a word characters in a string based on an array of words in this string 如何使用javascript查找字符串中最大单词的长度? - How to find the length of the largest word in a string with javascript? 如何在Javascript中的数组中找到最短单词的长度? - How to find length of shortest word in an array in Javascript? 如果包含在数组中,从字符串中删除单词? - Remove word from string if included in array? 将字长存储在javascript数组中 - Storing the word length in javascript array Javascript Array String Word Wrap Problem ---按顺序排列字符串和给定数组长度 - Javascript Array String Word Wrap Problem --- Permutation of Strings in Order and A Given Array Length
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM