简体   繁体   English

Javascript:查找文本文档中出现的所有单词

[英]Javascript: find all occurrences of word in text document

I'm trying to write a Javascript function to find indices of all occurrences of a word in a text document.我正在尝试编写一个 Javascript function 来查找文本文档中所有出现的单词的索引。 Currently this is what I have--目前这就是我所拥有的——

//function that finds all occurrences of string 'needle' in string 'haystack'
function getMatches(haystack, needle) {
  if(needle && haystack){
    var matches=[], ind=0, l=needle.length;
    var t = haystack.toLowerCase();
    var n = needle.toLowerCase();
    while (true) {
      ind = t.indexOf(n, ind);
      if (ind == -1) break;
      matches.push(ind);
      ind += l;
  }
  return matches;
}

However, this gives me a problem since this matches the occurrences of the word even when it's part of a string.但是,这给了我一个问题,因为即使它是字符串的一部分,它也会匹配单词的出现。 For example, if the needle is "book" and haystack is "Tom wrote a book. The book's name is Facebook for dummies", the result is the index of 'book', 'book's' and 'Facebook', when I want only the index of 'book'.例如,如果 needle 是“book”,haystack 是“Tom wrote a book. The book's name is Facebook for dummies”,结果是'book','book's'和'Facebook'的索引,当我只想要“书”的索引。 How can I accomplish this?我怎样才能做到这一点? Any help is appreciated.任何帮助表示赞赏。

Here's the regex I propose: 这是我建议的正则表达式:

/\bbook\b((?!\W(?=\w))|(?=\s))/gi

To fix your problem. 解决您的问题。 Try it with the exec() method. 尝试使用exec()方法。 The regexp I provided will also consider words like "booklet" that occur in the example sentence you provided: 我提供的regexp也将考虑在您提供的例句中出现的诸如“小册子”之类的单词:

function getMatches(needle, haystack) {
    var myRe = new RegExp("\\b" + needle + "\\b((?!\\W(?=\\w))|(?=\\s))", "gi"),
        myArray, myResult = [];
    while ((myArray = myRe.exec(haystack)) !== null) {
        myResult.push(myArray.index);
    }
    return myResult;
}

Edit 编辑

I've edited the regexp to account for words like "booklet" as well. 我已经编辑了正则表达式,以解决“小册子”之类的词。 I've also reformatted my answer to be similar to your function. 我也将答案重新格式化为与您的功能相似。

You can do some testing here 你可以在这里做一些测试

Try this: 尝试这个:

function getMatches(searchStr, str) {
    var ind = 0, searchStrL = searchStr.length;
    var index, matches = [];

    str = str.toLowerCase();
    searchStr = searchStr.toLowerCase();

    while ((index = str.indexOf(searchStr, ind)) > -1) {
         matches.push(index);
         ind = index + searchStrL;
    }
    return matches;
}

indexOf returns the position of the first occurrence of book. indexOf返回第一本书的位置。

var str = "Tom wrote a book. The book's name is Facebook for dummies";
var n = str.indexOf("book");

I don't know what is going on there but I can offer a better solution using a regex. 我不知道发生了什么,但是我可以使用正则表达式提供更好的解决方案。

function getMatches(haystack, needle) {
    var regex = new RegExp(needle.toLowerCase(), 'g'),
        result = [];

    haystack = haystack.toLowerCase();

    while ((match = regex.exec(haystack)) != null) {
        result.push(match.index);
    }
    return result;
}

Usage: 用法:

getMatches('hello hi hello hi hi hi hello hi hello john hi hi', 'hi');

Result => [6, 15, 18, 21, 30, 44, 47]

Conserning your book vs books problem, you just need to provide "book " with a space. 考虑到您的bookbooks问题,您只需为"book "提供一个空格。

Or in the function you could do. 或者在功能上您可以做到。

needle = ' ' + needle + ' ';

The easiest way might be using text.match(RegX) function. For example you can write something like this for a case insensitive search:最简单的方法可能是使用text.match(RegX) function。例如,您可以为不区分大小写的搜索编写如下内容:

"This is a test. This is a Test.".match(/test/gi)

Result:结果:

(2) ['test', 'Test']

Or this one for case sensitive scenarios:或者这个用于区分大小写的场景:

"This is a test. This is a Test.".match(/test/g)

Result:结果:

['test']

 let myControlValue=document.getElementById('myControl').innerText; document.getElementById('searchResult').innerText=myControlValue.match(/test/gi)
 <p id='myControl'>This is a test. Just a Test </p> <span><b>Search Result:</b></span> <div id='searchResult'></div>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM