[英]Counting occurences of substring ignoring case in JAVA
I am trying to count the occurences of div tag in my html file. 我正在尝试计算html文件中div标签的出现。 When I search for div
, I get 2 and for DIV
, I get 1650. So ideally when i use sHtml.toUpperCase()
, and then search for DIV
, I should get 1652 . 当我搜索div
,我得到2,对于DIV
,我得到sHtml.toUpperCase()
所以理想情况下,当我使用sHtml.toUpperCase()
然后搜索DIV
,我应该得到1652 。 But I am getting 1656 . 但是我要1656 。 What might be going wrong here? 这里可能出什么问题了?
/********* Counting occurences of div **************/
String findString = "DIV";
int lastIndex = 0;
int count = 0;
while (lastIndex != -1) {
lastIndex = sHtml.indexOf(findString, lastIndex);
if (lastIndex != -1) {
count++;
lastIndex += findString.length();
}
}
System.out.println("Count of div = " + count);
You are picking up substrings that were mixed-case before - say, Div
. 您要提取的是混合大小写的子字符串,例如Div
。 This is not a good reason to count "div"
s, though, because you would pick up parts of longer words (say, Division
or Divorce
). 但是,这不是计数"div"
的好理由,因为您会选择较长单词的一部分(例如, Division
或Divorce
)。
If you want a better count, you could use a simple regex to do the counting: 如果您希望获得更好的计数,则可以使用简单的正则表达式进行计数:
"[</]div[ />]"
This regular expression will match a div
that is preceded by <
or /
, and followed by a space, /
, or >
: 此正则表达式将匹配以<
或/
开头,后跟空格/
或>
的div
:
Pattern countRx = Pattern.compile("[</]div[ />]", Pattern.CASE_INSENSITIVE);
Matcher m = countRx.matcher(sHtml);
int count = 0;
while (m.find()) {
count++;
}
System.out.println(count);
By the process of elimination, you must have some combination of Div
, DIv
, DiV
or dIV
as well. 通过消除过程,您还必须具有Div
, DIv
, DiV
或dIV
某种组合。 It is also possible that your text contains a word with div in it (like long division ). 您的文本中也可能包含一个带有div的单词(如long division )。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.