[英]Javascript: find all occurrences of word in text document
I'm trying to write a Javascript function to find indices of all occurrences of a word in a text document.我正在尝试编写一个 Javascript function 来查找文本文档中所有出现的单词的索引。 Currently this is what I have--目前这就是我所拥有的——
//function that finds all occurrences of string 'needle' in string 'haystack'
function getMatches(haystack, needle) {
if(needle && haystack){
var matches=[], ind=0, l=needle.length;
var t = haystack.toLowerCase();
var n = needle.toLowerCase();
while (true) {
ind = t.indexOf(n, ind);
if (ind == -1) break;
matches.push(ind);
ind += l;
}
return matches;
}
However, this gives me a problem since this matches the occurrences of the word even when it's part of a string.但是,这给了我一个问题,因为即使它是字符串的一部分,它也会匹配单词的出现。 For example, if the needle is "book" and haystack is "Tom wrote a book. The book's name is Facebook for dummies", the result is the index of 'book', 'book's' and 'Facebook', when I want only the index of 'book'.例如,如果 needle 是“book”,haystack 是“Tom wrote a book. The book's name is Facebook for dummies”,结果是'book','book's'和'Facebook'的索引,当我只想要“书”的索引。 How can I accomplish this?我怎样才能做到这一点? Any help is appreciated.任何帮助表示赞赏。
Here's the regex I propose: 这是我建议的正则表达式:
/\bbook\b((?!\W(?=\w))|(?=\s))/gi
To fix your problem. 解决您的问题。 Try it with the exec()
method. 尝试使用exec()
方法。 The regexp I provided will also consider words like "booklet" that occur in the example sentence you provided: 我提供的regexp也将考虑在您提供的例句中出现的诸如“小册子”之类的单词:
function getMatches(needle, haystack) {
var myRe = new RegExp("\\b" + needle + "\\b((?!\\W(?=\\w))|(?=\\s))", "gi"),
myArray, myResult = [];
while ((myArray = myRe.exec(haystack)) !== null) {
myResult.push(myArray.index);
}
return myResult;
}
Edit 编辑
I've edited the regexp to account for words like "booklet" as well. 我已经编辑了正则表达式,以解决“小册子”之类的词。 I've also reformatted my answer to be similar to your function. 我也将答案重新格式化为与您的功能相似。
Try this: 尝试这个:
function getMatches(searchStr, str) {
var ind = 0, searchStrL = searchStr.length;
var index, matches = [];
str = str.toLowerCase();
searchStr = searchStr.toLowerCase();
while ((index = str.indexOf(searchStr, ind)) > -1) {
matches.push(index);
ind = index + searchStrL;
}
return matches;
}
indexOf
returns the position of the first occurrence of book. indexOf
返回第一本书的位置。
var str = "Tom wrote a book. The book's name is Facebook for dummies";
var n = str.indexOf("book");
I don't know what is going on there but I can offer a better solution using a regex. 我不知道发生了什么,但是我可以使用正则表达式提供更好的解决方案。
function getMatches(haystack, needle) {
var regex = new RegExp(needle.toLowerCase(), 'g'),
result = [];
haystack = haystack.toLowerCase();
while ((match = regex.exec(haystack)) != null) {
result.push(match.index);
}
return result;
}
Usage: 用法:
getMatches('hello hi hello hi hi hi hello hi hello john hi hi', 'hi');
Result => [6, 15, 18, 21, 30, 44, 47]
Conserning your book
vs books
problem, you just need to provide "book "
with a space. 考虑到您的book
与books
问题,您只需为"book "
提供一个空格。
Or in the function you could do. 或者在功能上您可以做到。
needle = ' ' + needle + ' ';
The easiest way might be using text.match(RegX)
function. For example you can write something like this for a case insensitive search:最简单的方法可能是使用text.match(RegX)
function。例如,您可以为不区分大小写的搜索编写如下内容:
"This is a test. This is a Test.".match(/test/gi)
Result:结果:
(2) ['test', 'Test']
Or this one for case sensitive scenarios:或者这个用于区分大小写的场景:
"This is a test. This is a Test.".match(/test/g)
Result:结果:
['test']
let myControlValue=document.getElementById('myControl').innerText; document.getElementById('searchResult').innerText=myControlValue.match(/test/gi)
<p id='myControl'>This is a test. Just a Test </p> <span><b>Search Result:</b></span> <div id='searchResult'></div>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.