简体   繁体   English

在JAVA中计算子字符串忽略大小写的次数

[英]Counting occurences of substring ignoring case in JAVA

I am trying to count the occurences of div tag in my html file. 我正在尝试计算html文件中div标签的出现。 When I search for div , I get 2 and for DIV , I get 1650. So ideally when i use sHtml.toUpperCase() , and then search for DIV , I should get 1652 . 当我搜索div ,我得到2,对于DIV ,我得到sHtml.toUpperCase()所以理想情况下,当我使用sHtml.toUpperCase()然后搜索DIV ,我应该得到1652 But I am getting 1656 . 但是我要1656 What might be going wrong here? 这里可能出什么问题了?

        /********* Counting occurences of div **************/
        String findString = "DIV";
        int lastIndex = 0;
        int count = 0;

        while (lastIndex != -1) {

            lastIndex = sHtml.indexOf(findString, lastIndex);

            if (lastIndex != -1) {
                count++;
                lastIndex += findString.length();
            }
        }
        System.out.println("Count of div = " + count);

You are picking up substrings that were mixed-case before - say, Div . 您要提取的是混合大小写的子字符串,例如Div This is not a good reason to count "div" s, though, because you would pick up parts of longer words (say, Division or Divorce ). 但是,这不是计数"div"的好理由,因为您会选择较长单词的一部分(例如, DivisionDivorce )。

If you want a better count, you could use a simple regex to do the counting: 如果您希望获得更好的计数,则可以使用简单的正则表达式进行计数:

"[</]div[ />]"

This regular expression will match a div that is preceded by < or / , and followed by a space, / , or > : 此正则表达式将匹配以</开头,后跟空格/>div

Pattern countRx = Pattern.compile("[</]div[ />]", Pattern.CASE_INSENSITIVE);
Matcher m = countRx.matcher(sHtml);
int count = 0;
while (m.find()) {
    count++;
}
System.out.println(count);

By the process of elimination, you must have some combination of Div , DIv , DiV or dIV as well. 通过消除过程,您还必须具有DivDIvDiVdIV某种组合。 It is also possible that your text contains a word with div in it (like long division ). 您的文本中也可能包含一个带有div的单词(如long division )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM