简体   繁体   English

不区分大小写的字符串搜索不起作用

[英]Case insensitive string search is not working

Please have a look at the following code 请看下面的代码

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>

<script>
function count()
{
    var listOfWords, paragraph, listOfWordsArray, paragraphArray;
    var wordCounter=0;

    listOfWords = document.getElementById("wordsList").value;

    //Split the words
    listOfWordsArray = listOfWords.split("\n");

    //Convert the entire word list to upper case
    for(var i=0;i<listOfWordsArray.length;i++)
    {
        listOfWordsArray[i] = listOfWordsArray[i].toUpperCase();
    }

    //Get the paragrah text
    paragraph = document.getElementById("paragraph").value;
    paragraphArray = paragraph.split(" ");

    //Convert the entire paragraph to upper case
    for(var i=0; i<paragraphArray.length; i++)
    {
        paragraphArray[i] = paragraphArray[i].toUpperCase();
    }

    //check whether paragraph contains words in list
    for(var i=0; i<listOfWordsArray.length; i++)
    {
    /*  if(paragraph.contains(listOfWords[i]))
        {
                wordCounter++;
        }*/

        re = new RegExp("\\b"+listOfWordsArray[i]+"\\b");

        if(paragraph.match(re))
        {
            wordCounter++;
        }
    }

    window.alert("Number of Contains: "+wordCounter);
}
</script>

</head>


<body>
<center>
<p> Enter your Word List here </p>
<br />
<textarea id="wordsList" cols="100" rows="10"></textarea>

<br />
<p>Enter your paragraph here</p>
<textarea id="paragraph" cols="100" rows="15"></textarea>

<br />
<br />
<button id="btn1"  onclick="count()">Calculate Percentage</button>

</center>
</body>
</html>

Here, what I am trying to do is counting how any number of words are in paragraph which are also included in wordList . 在这里,我要尝试计算的是paragraph中有多少单词,这些单词也包含在wordList words in wordList are separated by new line. wordList中的单词用换行符分隔。

However I need this check to be case insensitive. 但是我需要此检查不区分大小写。 for an example, there should be no difference between 'count' , 'COUNT' and 'Count'. 例如,“ count”,“ COUNT”和“ Count”之间应该没有区别。

But here, I am always getting the answer 0. What am I doing wrong here? 但是在这里,我总是得到答案0。在这里我做错了什么?

Update 更新

I tried the following function, provided by SO User 'Kolink'. 我尝试了由SO用户'Kolink'提供的以下功能。 However it is giving different answers in different runs. 但是,它在不同的运行中给出不同的答案。 In first few runs it was correct, then it starts to provide wrong answers! 在最初的几次运行中,它是正确的,然后开始提供错误的答案! Maybe JavaScript as static variables? 也许JavaScript作为static变量?

You are preparing the paragraph's words in paragraphArray but then you never use it. 你正准备在该段的话paragraphArray但你从来没有使用它。

I would suggest something like this: 我建议这样的事情:

var words = document.getElementById('wordsList').value.split(/\r?\n/),
    l = words.length, i, total = 0, para = document.getElementById('paragraph').value;
for( i=0; i<l; i++) if( para.match(new RegExp("\\b"+words[i]+"\\b","i"))) total++;
alert("Total: "+total);

Solution

How about just this: 这样吧:

var wc = function (text, wordsToMatch) {
  var re = new RegExp("(" + (wordsToMatch || ["\\w+"]).join('|') + ")", "gi");
  var matches = (text || "").match(re);

  // console.log(matches);
  return (matches ? matches.length : 0);
};

Or for an unreadable version (not recommended): 或对于不可读的版本(不推荐):

var wc = function (t, w) {
  return (((t || "").match(new RegExp("(" + (w || ["\\w+"]).join('|') + ")", "gi")) || []).length);
};

Integration 积分

So, in your code, you'd be able to throw away most of it and write: 因此,在您的代码中,您可以丢弃大部分代码并编写:

function count()
{
    var wordsList   = document.getElementById("wordsList").value;
    var paragraph   = document.getElementById("paragraph").value;
    var wordCounter = wc(paragraph, wordsList.split("\n"));

    window.alert("Number of Contains: " + wordCounter);
}

Examples 例子

Example 1 (matches against a list) 示例1(与列表匹配)

Input: 输入:

console.log(wc("helloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworld", ["world"]));
console.log(wc("helloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworldhelloworld", ["hello", "world"]));

Output: 输出:

12
24

Example 2 (safe defaults) 示例2(安全默认值)

Input: 输入:

console.log(wc("", ["hello", "world"]));
console.log(wc());
console.log(wc(""));

Output: 输出:

0
0
0

Example 3 (as a default word counter) 示例3(作为默认字计数器)

Input: 输入:

console.log(wc("hello"));
console.log(wc("hello world"));

Output: 输出:

1
2

You could search with no regexp ( link to eliminateDuplicates ) : 您可以不使用regexp进行搜索( 链接至eliminateDuplicates ):

var wordCounter = 0;

// retrieves arrays from textareas

var list = eliminateDuplicates(
    document.getElementById('wordsList').value
    .toUpperCase()
    .split(/\s+/g)
);
var para = eliminateDuplicates(
    document.getElementById('paragraph').value
    .toUpperCase()
    .split(/\s+/g)
);

// performs search

for (var i1 = 0, l1 = para.length; i1 < l1; i1++) {
    var word = para[i1];
    for (var i2 = 0, l2 = list.length; i2 < l2; i2++) {
        if (list[i2] === word) {
            wordCounter++;
            break;
        }
    }
}

your regex is not well formatted. 您的正则表达式格式不正确。 try 尝试

re = new RegExp("\\b"+listOfWordsArray[i]+"\b\");

cause the first caracter is \\ , so the last should be \\ , and not b 因为第一个角色是\\,所以最后一个角色应该是\\,而不是b

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM